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Abstract 

This  research  characterized  the  effects  of  three  species  of  wetland  plant  on  the 
composition  and  diversity  of  the  rhizosphere  bacterial  communities  they  supported. 
Diversity  and  community  composition  were  addressed  in  relation  to  three  factors:  plant 
presence,  plant  species,  and  soil  depth;  these  factors  helped  identify  the  diversity  and 
composition  of  subsurface  flow  wetlands  and  its  remediation  potential.  The  largest 
sample  of  16S  rRNA  DNA  sequences  ever  collected  to  date  was  described  here,  and 
enabled  us  to  make  comparisons  of  the  effects  of  the  presence  or  absence  of  plants,  plant 
species,  and  plant  rhizosphere  depth  on  microbial  diversity  and  community  composition, 
using  newly  developed  software  packages.  It  was  determined  that  plant  rhizosphere 
supported  a  more  diverse  microbial  community  than  plant-free  soils.  Also  there  was 
evidence  that  Eleocharis  erythropoda  was  significantly  more  diverse  than  the  Carex 
comosa  microbial  community,  but  not  significantly  in  comparison  to  the  Scirpus 
atrovirens  community.  Samples  were  taken  from  a  top,  middle,  and  bottom  layer.  While 
there  did  not  appear  to  be  an  effect  of  diversity  due  to  depth,  one  of  the  three  plant 
species  did  support  a  less  diverse  community  at  its  middle  depth  than  the  other  two 
plants.  This  finding  was  consistent  with  a  previous  wetland  study,  and  was  significant 
because  wetlands  planted  with  this  species  can  promote  a  less  diverse  microbial 
community.  The  compositions  based  on  phyla  classifications  by  RDP  of  the 
communities,  however,  were  not  significant  for  any  of  the  comparisons. 
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MOLECULAR  CHARACTERIZATION  OF  WETLAND  SOIL  BACTERIAL 


COMMUNITIES  IN  CONSTRUCTED  MESOCOSMS 

Chapter  I:  Introduction 

This  research  focused  on  mesocosms  constructed  to  investigate  the 
rhizosphere  bacterial  community  associated  with  a  constructed  wetland  at  Wright- 
Patterson  Air  Force  Base  (WPAFB),  Ohio.  The  wetland  was  built  in  2000  to  treat 
groundwater  contaminated  with  Tetrachloroethylene  (PCE)  and  Trichloroethylene  (TCE). 
Twelve  mesocosms  were  constructed  to  simulate  the  subsurface  flow  of  the  wetland,  and 
were  housed  at  the  Wright  State  University  (WSU)  greenhouse  in  Dayton,  OH.  The 
mesocosm  design  is  thoroughly  explained  in  Chapter  III  of  this  thesis.  Nine  of  the  12 
mesocosms  were  planted  with  common  wetland  plants  used  in  the  constructed  wetland, 
and  three  unplanted  mesocosms  served  as  controls.  Three  mesocosms  were  planted  with 
Eleocharis  erythropoda  (Spike  Rush),  two  were  planted  with  Carex  comosa  (Bearded 
Sedge),  and  four  were  planted  with  Scirpus  atrovirens  (Green  Bulrush)  (Yan  2006).  The 
initial  intent  was  to  evenly  distribute  the  plant  species  over  the  nine  mesocosms; 
however,  due  to  a  mistake  identifying  the  plants  during  their  collection,  the  distribution 
was  not  even. 

The  need  for  less  expensive  and  more  efficient  remediation  techniques  has  driven 
a  strong  interest  in  bioremediation.  Remediation  using  various  microbial  processes  has 
been  the  focal  point  of  many  research  projects,  but  little  is  known  about  the  morphology 
and  functionality  of  microbial  consortia  that  perform  bioremediation.  In  order  to 
completely  understand  and  control  biological  remediation,  engineers  need  to  understand 
how  organisms  within  the  system  operate. 
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Since  the  vast  majority  of  microorganism  cannot  be  grown  under  isolated 
conditions,  and  therefore  cannot  be  studied  directly,  this  understanding  and  control  has 
not  yet  been  achieved.  An  estimated  1%  of  microorganisms  have  been  isolated  using 
traditional  culture  laboratory  methods  (Pace  2008,  Schloss  &  Handelsman  2006, 
Kowalchuk  2002).  New  molecular  methodologies,  such  as  16S  rRNA  gene  analysis, 
allow  examination  of  the  elusive  99%  of  the  uncultured  organisms  by  examining  the 
organisms’  DNA  sequence.  Numerous  studies  of  this  nature  have  been  conducted  in  the 
field  or  in  microcosms  (Grayston  1998,  Kowalchuck  2002).  This  is  the  first  study  of  its 
kind  to  apply  molecular  tools  to  the  study  of  microbial  communities  in  mesocosms. 

Research  on  wetlands  constructed  for  the  purpose  of  water  treatment  is  relatively 
new.  In  1973,  the  first  pilot  scaled  constructed  wetland  treatment  system  was  established 
combining  a  marsh  wetland,  a  pond,  and  a  meadow,  in  series  (Kadlec  &  Knight  1996). 
However,  the  intricate  interactions  and  relationships  between  the  microbial  communities 
and  the  plant  life  in  a  treatment  wetland  have  not  been  thoroughly  examined  (Stottmeister 
2003). 

Microbial  degradation  of  a  contaminant,  such  as  PCE  and  TCE,  takes  place 
because  microorganisms  use  the  contaminant  as  an  electron  donor  (carbon  source)  or,  as 
an  electron  acceptor  (oxidant).  This  promotes  the  organism’s  growth  and  ultimately  its 
survival  (Fields  2004).  However,  microbes  do  not  execute  degradation  without  outside 
support.  Soil  is  the  main  supporting  material  for  plant  growth,  which  in  turn  provides  the 
structure  and  environment  for  microbial  growth.  These  three  constituents  work  in  a 
delicate  balance  toward  the  ultimate  outcome  of  bioremediation,  and  understanding  this 
balance  is  of  major  interest  to  researchers  (Stottmeister  2003). 
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Numerous  studies  have  concentrated  on  soil  properties  associated  with  different 
species  of  plants,  and  plant  growth  and  survival  in  different  soil  types  (Kennedy  1995, 
Grayston  1998,  Bardgett  1999,  Meithling  2000,  Yan  2006,  Bezemer  2006).  Those 
studies  also  looked  at  the  composition  of  the  microbial  community.  All  of  the  studies 
used  general  methods,  such  as  substrate  utilization,  to  identify  functional  groups  of 
bacteria,  and  identification  based  on  metabolic  profiles,  rather  than  molecular 
technologies,  to  determine  the  composition  (Kennedy  1995,  Grayston  1998).  Still  other 
studies  characterized  the  effects  plants  species  diversity  has  had  on  a  particular  microbial 
functional  group,  like  ammonia  oxidizers  (Kowalchuk  2000). 

Studies  have  characterized  microbial  communities  in  different  environments 
based  on  molecular  technology;  however,  sample  sizes  are  typically  low  compared  to  the 
large  sample  size  presented  here.  Bomeman  et  al  (1996).,  surveyed  the  microbial 
diversity  of  an  agricultural  soil  in  Wisconsin.  They  used  124  DNA  sequences  from  16S 
rRNA  sequences  in  his  research,  and  analyzed  the  sequences  using  the  Basic  Local 
Alignment  Search  Tool  (BLAST),  described  later,  for  his  analysis.  Major  Ethan  Bishop 
used  357  sequences  and  analyzed  them  using  BLAST  and  Estimates 
(http://viceroy.eeb.uconn.edu/EstimateS).  Estimates  calculates  diversity  parameters  and 
allowed  for  complete  analysis  of  the  sample  sequences;  however,  the  sample  size  was 
extremely  small  (Bishop  2006).  Other  studies  have  used  between  100  and  686  sequences 
for  analysis  of  microbial  communities  and  their  diversity  (Liu  1997,  McGarvey  2004, 
Jannsen  2006).  This  study  used  3,099  sequences  for  composition  analysis,  and  2820 
sequences  for  diversity  parameter  analysis;  it  is  the  largest  known  collection  of 
sequences,  or  community,  to  date. 
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The  software  packages  used  to  analyze  the  data  from  the  16S  rRNA  gene  analyses 
were  the  Ribosomal  Database  Project  (RDP)  version  9.57  Classifier  and  Aligner 
programs,  PHYLogeny  inference  Package  (Phylip)  version  3.2,  and  distance  based 
operational  taxonomic  unit  (OTUs)  and  richness  determination  (DOTUR)  version  1.53. 
These  software  packages  will  be  described  in  detail  in  the  literature  review  section.  They 
allowed  characterization  of  the  entire  microbial  community  into  phyla,  and  produced 
parameters  that  described  the  diversity,  richness  and  evenness,  of  each  community. 
Therefore,  we  were  able  to  compare  communities,  and  note  any  effect  on  the  diversity  or 
composition  of  the  microbial  community.  This  information  could  be  used  to  make 
inferences  about  the  makeup  of  the  actual  wetland  microbial  community  and  its 
remediation  potential.  This  research  provides  a  baseline  that  will  be  used  for  comparison 
to  subsequent  contaminated  mesocosm  research  and  research  specifically  designed  to 
investigate  the  trends  identified  here. 

Research  Objectives 

The  primary  objectives  of  this  research  were  to: 

1 .  Determine  the  effects  of  plant  presence  on  microbial  diversity  and  community 
composition. 

2.  Determine  the  effects  of  plant  species  on  microbial  diversity  and  community 
composition. 

3.  Determine  the  effects  of  subsurface  flow  soil  depth  on  microbial  diversity  and 
community  composition. 

The  results  of  this  research  help  define  the  relationships  between  microbial 
community  diversity  and  plant  species,  microbial  community  diversity  and  depth  in  soil 
that  is  continuously  saturated  with  water  and  experiences  a  subsurface  flow  and,  most 
importantly,  determined  the  impact  of  plant  presence  on  the  microbial  community.  This 
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research  provides  useful  information  for  design  and  construction  of  appropriate  and 
efficient  wetlands  to  biodegrade  PCE  and  TCE. 
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Chapter  II:  Literature  Review 


This  chapter  reviews  the  literature  that  supports  the  major  objectives  of  this 
research.  First,  the  fundamental  basis  of  plant  and  microbial  interactions  that  take  place 
in  treatment  wetlands  are  discussed.  Then,  the  16S  rRNA  gene  analysis  method  and  its 
background  are  discussed.  Finally,  the  software  packages  used  in  calculating  the  various 
diversity  parameters  used  in  analysis  will  be  introduced,  and  their  capabilities  and 
limitations  discussed. 

Treatment  Wetlands  and  Microbial/Plant  Interactions 

Natural  wetlands  fdtered  groundwater  long  before  humans  began  constructing 
artificial  ones  (Kadlec  &  Knightl996;  Stottmeister  2003).  Constructed  wetlands  have 
been  established  throughout  the  world  to  clean  contamination,  such  as  PCE  and  TCE, 
since  the  work  of  Kathe  Seidel  in  the  1960s  (Stottmeister  2003).  However,  the  intricate 
interactions  between  the  microbial  communities  that  drive  the  degradation  and  the  abiotic 
influences  in  the  wetland  environment  are  not  well  understood.  Nevertheless,  it  is  widely 
accepted  that  the  microorganisms  in  a  wetland  transform  contaminants,  such  as  PCE  and 
TCE,  into  innocuous  constituents  (Kadlec  &  Knight  1996,  Stottmeister  2003). 

This  research  was  intended  to  identify  three  factors  that  affect  microbial 
communities  in  soil.  Some  researchers  are  convinced  that  the  soil  properties  are  the  key 
to  understanding  the  degradation  properties  of  microbial  communities  in  treatment 
wetlands.  They  hypothesize  that  the  soil  provides  the  environment  for  certain  plants  to 
grow,  and,  in  turn,  the  associated  microbial  community  can  flourish  (Marrs  1991  , 
Marschner  2001).  However,  studies  have  also  shown  a  direct  relationship  between  plant 
species  and  associated  microbial  communities,  and  some  researchers  believe  that  plant 
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species  do  influence  the  associated  microbial  community  more  so  than  the  type  of  soil  in 
a  treatment  wetland  (Grayston  1998,  Meithling  2000,  Bezemer  2006). 

Plants  that  survive  in  a  wetland  environment  have  adapted  features.  The  plants 
are  able  to  survive  in  environments  that  are  flooded  at  least  part  of  the  year.  All  plants 
require  water  for  survival,  but  excess  water  is  a  stressor.  Therefore,  wetland  plants  have 
two  adaptations  that  allow  their  survival  in  a  stressed  wetland  environment.  The  first  is 
aerenchymous  plant  tissues.  This  tissue  allows  transport  of  gases  such  as  oxygen  from 
the  atmosphere  to  the  root  zone,  or  rhizosphere.  The  second  adaptation  is  the  generation 
of  adventitious  roots  from  flooded  stem  tissue.  This  allows  extraction  of  dissolved 
oxygen  and  other  nutrients  for  use  by  the  plant  from  the  surrounding  environment 
(Kadlec  &  Knight  1996,  Stottmeister  2003).  Oxygen  not  used  by  the  plant  for  respiration 
is  released  into  rhizosphere  and  other  parts  of  the  root  system.  This  forms  a  protective 
layer  around  root  surface,  which  continuously  counterbalances  the  chemical  and 
biological  oxygen  demand  in  the  soil  (Stottmeister  2003).  This  release  rate  of  oxygen 
and  other  nutrients  is  plant  species  specific  (Kadlec  &  Knight  1996). 

The  flow  of  oxygen  in  a  plant  is  driven  by  diffusion  and  convective  processes. 

The  types  and  degree  of  these  mechanisms  are  specific  to  each  plant  species.  Flooded 
soils  are  oxygen  deprived  (Stottmeister  2003);  however,  plants  are  able  to  provide 
oxygen  deep  into  the  rhizosphere.  The  rhizosphere  is  divided  into  two  distinct  regions. 
The  endorhizosphere  is  the  interior  root  zone,  and  the  ectorhizosphere  is  the  root’s 
surroundings.  The  area  where  they  meet  is  referred  to  as  the  rhizoplane,  and  this  area  is 
the  site  of  the  most  intensive  interactions  between  plants,  soil,  and  microbes  (Stottmeister 
2003). 
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Since  the  exudates  from  a  plant’s  rhizosphere  have  been  shown  to  influence 
microbial  composition  and  performance,  it  is  similarly  possible  that  microbial 
communities  associated  with  different  species  of  plant  will  also  be  influenced 
(Stottmeister  2003).  In  a  constructed  wetland  the  main  role  of  degradation  lies  with  the 
microorganisms,  not  the  plants.  However,  the  plants  do  have  an  effect  on  the  associated 
microbial  community. 

In  this  study,  the  microbial  communities  associated  with  three  typical  wetland 
plants  were  investigated.  There  are  numerous  studies  showing  the  properties  that  various 
plants  bring  to  a  wetland  (Grayston  1998,  Stottmeister  2003,  Bezemer  2006).  However, 
there  are  relatively  few  studies  that  examine  how  plants  affect  the  detailed  microbial 
community  composition  and  diversity.  It  is  generally  accepted  that  plants  increase  the 
diversity  of  a  microbial  community;  however,  no  one  has  specifically  attempted  an  in 
depth  study  concerning  this  matter. 

This  project  used  mesocosms  to  establish  microbial  communities  for  each  of  three 
species  of  plants.  The  plants  selected  were  Eleocharis  erythropoda,  Carex  comosa,  and 
Scirpus  atrovirens.  All  of  these  plants  are  in  the  phylum  Tracheophyta  (vascular  plants), 
class  Angiospermae  (flowering  plants)  and  further  divided  into  Monocotyledonae 
(monocots).  All  of  the  plants  chosen  for  this  project  have  an  emerging  herb  growth 
habit,  which  means  that  most  of  the  above-ground  part  of  the  plant  emerges  above  the 
water  line  in  the  wetland.  This  is  an  important  trait  because  emergent  plants  provide 
surface  area  for  microbial  growth  (Kadlec  &  Knight  1996).  The  studies  that  investigated 
plant  species’  effects  on  soil  properties  noted  that  plants  with  similar  growth  habits  and 
taxonomy  typically  produce  similar  soil  property  effects  (Kadlec  &  Knight  1996, 
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Bezemer  2006).  Therefore,  it  is  reasonable  to  assume  that  the  microbial  community 
associated  with  these  similar  species  of  plants  will  only  differ  due  to  a  specific  property 
of  the  plant’s  rhizosphere,  and  not  because  of  an  indirect  effect  the  plant  has  on  soil 
properties. 

Soil  Microbial  Diversity  and  Diversity  Statistics 

A  soil’s  microbial  community  cannot  be  exhaustively  sampled;  therefore,  samples 
must  be  used  to  estimate  the  actual  diversity  of  organisms  in  that  environment.  Diversity 
consists  of  richness  and  evenness.  Species  richness  is  defined  as  the  number  of  different 
units  present  in  a  community  (Niibel  1999).  The  classification  of  a  unit  can  be  taken  as  a 
species,  class,  or  other  biological  level,  depending  on  the  intent  of  the  study.  For 
microorganisms,  it  is  particularly  difficult  to  define  a  unit.  Definite  criteria  have  not  been 
published.  However,  if  the  unit  definition  stays  consistent  throughout  a  particular  study, 
and  is  adequately  documented,  it  does  not  become  a  problem  in  analyzing  data  (Hughes 
2001).  Evenness  is  considered  the  relative  distribution  of  individuals  among  certain 
predefined  units,  such  as  a  species.  Both  of  these  components  are  investigated  in  this 
project. 

Diversity  can  be  positively  linked  to  productivity  of  a  community.  However, 
microbial  diversity  is  very  hard  to  quantify  because  the  tested  sample  will  be  a  small 
subset  of  the  site’s  actual  population.  It  might  not  be  fully  representative  of  the 
population  at  large.  Nonetheless,  the  estimators  for  comparative  analysis  described 
below  have  been  applied  to  the  microbial  world.  The  estimators  used  for  this  project  are 
described  in  detail  later  in  this  section.  The  correlation  of  the  estimators  to  the  new 
molecular  techniques  has  not  been  evaluated  but  their  use  does  show  promise  (Niibel 
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1999).  For  this  project,  the  main  goal  was  to  document  the  change  in  microbial 
community  diversity  across  depth  gradients,  plant  species,  and  with  and  without  plants. 
To  answer  these  questions  only  relative  diversities  are  required.  Therefore,  the  various 
diversity  statistics  were  used  for  analysis  (Hughes  2001). 

16S  rRNA  Gene  Analysis  Method 

Biologically  defining  organisms  with  molecular  technology  uses  the  concept  of 
phylogeny.  A  molecular  basis  for  this  concept  was  introduced  by  Olsen  and  Woese  in 
1993.  This  concept  stated  that  the  majority  of  essential  genes  in  a  genome  share  a 
common  heritage  or  evolutionary  history.  A  gene  mutates  over  time.  Theoretically,  this 
change  can  be  measured;  however,  the  original  state  of  an  organism  remains  unknown. 
Therefore,  biologists  assume  that  two  versions  of  a  gene  sequence  originate  from  the 
same  ancestry.  Their  sequence  difference  can  be  measured  and  compared,  and  ultimately 
the  relation  between  two  sequences  can  be  established  (Woese  1987).  This  is  referred  to 
as  an  organism’s  evolutionary  distance. 

The  process  of  selecting  a  gene  to  be  used  for  determining  evolutionary 
relationships  can  be  streamlined  by  focusing  on  genes  that  perform  a  central  function  and 
are  intimately  involved  in  the  cell’s  activity.  Several  genes  fit  this  description:  rRNA, 
RNA  polymerase,  elongation  factor  G,  proton-translocating  ATPases,  and  others  (Olsen 
1993).  Since  several  genes  can  be  used,  other  criteria  must  be  considered.  A  particular 
gene  must  provide  enough  appropriate  information  for  analysis.  In  most  cases,  the  goal 
of  these  research  projects  is  to  identify  the  properties  and  makeup  of  a  consortium  of 
microorganisms  from  a  particular  environmental  sample,  such  as  soil.  Therefore,  the 
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gene  chosen  must  be  evolutionarily  linked  to  its  relatives  and  be  variable  enough  to 
distinguish  between  unique  species  (Woese  1987,  Clarridge  2004). 

rRNA  is  a  key  element  of  the  cell’s  protein  synthesis  process,  and  thus  is 
functionally  and  evolutionarily  homologous  in  all  organisms.  In  bacteria  there  are  3 
different  rRNAs:  5S  which  is  ~120  nucleotides,  16S  which  is  -1550  nucleotides,  and 
23S  which  is  -3000  nucleotides  (Woese  1987;  Olsen  1986;  Clarridge  2004).  The  exact 
nucleotide  length  varies  in  organisms,  and  the  aforementioned  lengths  are  averages.  The 
5S  and  23  S  rRNAs  were  found  to  be  inappropriate  molecular  tools  for  the  analysis  of 
microbial  communities.  The  5S  rRNA  was  not  long  enough  to  provide  adequate 
information  or  detail  to  make  an  accurate  comparison  tool  (Woese  1987).  The  23S  rRNA 
was  too  large  a  molecule,  and  little  research  has  been  directed  into  using  it  for  genetic 
analysis.  Therefore  neither  has  been  chosen  in  typical  research  methodologies  (Olsen 
1986).  The  most  widely  studied  gene  is  the  16S  rRNA  gene  (Schloss  2006). 

The  16S  rRNA  gene  is  large  enough  to  have  conserved  sequences,  which  are 
identical  or  nearly  identical  in  all  bacteria,  and  variable  regions.  The  variable  regions 
provide  distinguishing  and  statistically  valid  measurements  of  evolutionary  distances,  and 
thereby  of  “species”  or  other  levels  of  classifications  of  bacteria  (Clarridge  2004). 
Regions  within  the  16S  rRNA  gene  are  less  affected  by  reconfiguration  that  occur  in  the 
genome,  and  maintain  a  highly  conserved  picture  of  the  organism’s  evolutionary  history 
(Olsen  1993).  This  is  largely  due  to  the  fact  that  rRNA  is  a  critical  component  of  the 
cell’s  function. 

In  cases  requiring  detail,  such  as  describing  a  new  species,  it  is  appropriate  to 
sequence  the  entire  16S  rRNA  gene  multiple  times.  Also  for  research  to  distinguish 


22 


between  specific  taxa  or  strains,  sequencing  the  entire  gene  would  be  most  appropriate. 
For  descriptions  of  microbial  communities,  the  16S  rRNA  gene  is  used  in  two  basic 
ways.  The  entire  ~1550  base  pair  (bp)  length  is  sequenced  when  relatively  few  microbes 
are  analyzed,  or  a  smaller  5’,  500  bp  region  is  used  when  sampling  larger  and  more 
diverse  communities.  The  first  500  bp  provide  sufficient  information  and  differentiation 
to  distinguish  separate  organisms,  thought  not  always  to  specifically  denote  genus  and 
species.  Furthermore,  the  first  500  bp  region  has  been  shown  to  hold  a  higher  percentage 
of  diversity  than  any  other  region.  Clarridge  et  al.  compared  100  organisms  using  the 
1550  bp  sequence  or  the  500  bp  sequences  and  found  the  relationships  to  be  highly 
similar  (Clarridge  2004).  Since  the  goal  of  this  thesis  project  was  to  differentiate 
between  organisms  and  not  to  identify  new  species,  and  an  extremely  large  sample  set 
was  generated,  use  of  the  500  bp  portion  of  the  gene  was  justified. 

In  1977,  Woese  et  al.,  used  the  rRNA  gene  to  completely  transform  the 
nomenclature  of  living  organisms.  Traditionally,  living  organisms  had  been  classified 
into  two  distinct  domains:  Prokaryotae  and  Eukaryotae.  However,  as  molecular  genetics 
became  a  more  common  area  of  research,  living  organisms’  genomes  were  investigated, 
and  the  traditional  nomenclature  became  obsolete.  Woese  et  al.,  used  the  rRNA  gene  to 
classify  living  organisms  into  three  new  classifications  called  urkingdoms.  The  first  was 
the  urkingdom  eubacteria,  which  includes  all  typical  bacteria.  The  second  was 
urkaryotes,  which  was  defined  by  the  18S  rRNAs  of  the  eukaryotic  cytoplasm.  Both  of 
these  corresponded  nicely  to  the  traditional  groupings  of  Prokaryote  and  Eukaryote. 
However,  a  third  classification  was  also  introduced.  The  Archaebacteria  appear  to  be  no 
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more  related  to  the  typical  bacteria  as  they  are  to  eukaryotes.  Investigating  the  genetic 
makeup  of  organisms  has  unlocked  an  entirely  new  classification  system  (Woese  1977). 

16S  rRNA  gene  analysis  was  chosen  as  the  appropriate  molecular  tool  for  the 
mesocosm  study  in  this  thesis.  The  steps  in  this  analysis  are  fairly  straightforward:  first 
DNA  extraction  from  mesocosm  soils,  second  Polymerase  Chain  Reaction  (PCR)  to  find 
16S  rRNA  sequences  within  the  DNA  extract,  third  cloning  of  the  amplified  16S  rRNA 
products,  next  sequencing  of  the  products,  and  finally  comparative  analysis  of  the 
retrieved  sequences  (Bishop  2006).  The  sampling  methodology  is  explained  in  greater 
detail  in  the  next  chapter  and  by  Bishop  (2006).  A  full  and  detailed  summary  of  the  PCR 
method  used  is  included  in  Appendix  A.  The  PCR  reactions  generate  a  heterogeneous 
mixture  of  16S  rRNA  sequences.  It  is  therefore  necessary  to  clone  individual  molecules 
in  order  to  isolate  them  for  sequencing.  This  step  had  the  added  benefit  of  ensuring 
adequate  concentrations  of  high-quality  DNA.  The  exact  procedures  for  all  processes  are 
explained  in  the  next  chapter  and  the  appendices. 

The  choice  of  appropriate  primers  to  amplify  the  -500  bp,  5’  section  of  the  16S 
rRNA  gene  was  highly  dependent  on  the  project’s  research  goals.  In  this  project,  the  goal 
was  to  identify  and  differentiate  as  many  bacteria  as  possible  from  the  mesocosm  soil 
samples.  Therefore,  primers  constructed  from  the  conserved  regions  at  the  beginning  of 
the  gene  and  at  the  -540  bp  region  were  used  (Clarridge  2004).  These  primers  are  often 
referred  to  as  “universal”  because  they  are  built  from  the  conserved  regions  that  all 
bacteria  have.  However,  no  primer  can  be  designed  to  completely  anneal  to  all  bacteria 
since  there  is  variability  between  bacteria  and  other  organisms  (Baker  2003).  The 
“universal”  primers  used  in  this  project  introduce  bias  into  the  results,  because  they  are 
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designed  to  anneal  to  bacteria  16S  rRNA,  but  can  anneal  to  genes  from  other  organisms 
that  are  not  within  the  domain  Bacteria.  Furthermore,  they  may  not  anneal  well  to  the 
16S  rRNA  genes  of  some  bacteria.  This  will  be  discussed  further  in  the  Methodology 
section  of  this  thesis. 

RDP  and  Alignment 

RDP  provides  ribosome  related  data  and  services  to  the  scientific  community, 
including  online  data  analysis  and  aligned  and  annotated  bacterial  small-subunit  16S 
rRNA  sequences.  RDP  had  451,545  rRNA  subunit  sequences  as  of  November  8,  2007. 
RDP  has  several  functions  that  are  available  to  the  online  user.  Studies  have  used  RDP 
primarily  to  classify  sequences  into  phyla  using  its  Classifier  function.  Nercessian  et  al., 
and  Ben-Dov  et  al.,  are  examples  of  studies  which  applied  RDP  in  their  analyses. 
Nercessian  identified  bacterial  populations  active  in  metabolism  of  Ci  compounds  in  the 
sediment  of  a  Washington  state  lake.  RDP  classifier  was  used  to  define  affiliations  to 
known  phlyogenetic  groups  (Nercessian  2005).  Eitan  Ben-Dov  attempted  to  show  the 
advantage  of  using  Inosine  at  the  3’  termini  of  16S  rRNA  gene  universal  primers  for  the 
study  of  microbial  diversity.  He  used  RDP  Classifier  to  assign  16S  rRNA  sequences  to  a 
taxonomical  hierarchy  (Ben-Dov  2006). 

In  this  project,  RDP  was  used  for  three  important  steps.  RDP  was  used  to  assist  in 
the  trimming  and  editing  process,  described  in  detail  in  Chapter  III.  RDP  was  also  used 
to  assign  sequences  to  particular  phyla  by  the  RDP  Classifier  program  using  the  80% 
confidence  level  to  a  sequence  in  the  database.  Finally,  RDP  was  used  to  align  the 
sequences  used  in  the  DOTUR  analysis. 
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This  project  initially  had  3,099  sequences  for  RDP  analysis.  The  online  aligners, 
such  as  ClustalW  and  Alignment  App,  were  not  capable  of  handling  this  number  of 
sequences.  RDP  added  an  aligner  as  a  part  of  its  services,  and  it  was  able  to  handle  this 
project’s  data  set  (Cole  2003).  The  sequence  alignment  was  crucial  to  identify  regions  of 
similarity  across  the  entire  group  of  sequences  so  that  homologous  residues  appear  in  the 
same  column  of  alignment.  It  is  assumed  that  similar  residues  are  descended  from  the 
same  common  ancestral  gene,  and  to  the  extent  that  assumption  is  incorrect,  the 
alignment,  and  conclusions  of  the  analysis  lose  justification  (Olsen  1993). 

In  a  recent  study,  Wong  et  al.,  investigated  aligner  limitations.  They  used  seven 
prominent  aligner  programs:  ClustalW,  Muscle,  T-Coffee,  Dialign  2,  Mafft,  Dca,  and 
ProbCons  in  their  investigation.  They  found  that  46.2%  of  the  data  had  one  or  more 
differing  tree  phylogenies  depending  on  the  aligner  used.  They  conclude  that  the 
inconsistencies  were  not  due  to  the  alignment  procedures  but  rather  the  processes  of 
substitutions,  insertions,  and  deletions  that  make  some  sequences  hard  to  align. 

However,  many  biologists  do  not  incorporate  aligner  uncertainty  because  they  accept  that 
their  alignment  procedure  was  carefully  constructed  by  the  provider  (Wong  2008).  This 
was  the  position  accepted  in  this  research. 

Comparative  Analysis  and  Software 

Once  the  alignment  was  completed,  richness  parameters  and  evenness  were 
calculated,  based  on  the  evolutionary  distance  between  the  sequences.  Evolutionary 
distances  were  determined  using  a  program  called  Phylip,  version  3.2,  which  was 
introduced  in  an  online  form  in  mid- 1995.  This  package  had  several  functions,  but  most 
importantly,  it  had  the  ability  to  compute  evolutionary  distances  between  nucleic  acid 
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sequences  and  form  a  distance  matrix  through  its  DNADIST  function  using  the  Jukes 
cantor  method  (Felsenstein  2005).  In  Chapter  12  of  Bioinformatics  Methods  and 
Protocol,  edited  by  Misener  and  Krawetz,  Retief  calls  Phylip  an  extensive  tool  that  covers 
every  method  of  phylogenetic  analysis  up  to  1999  (Retief  1999).  A  study  by  McGlynn  et 
al.,  describes  using  Phylip  to  determine  if  distinct  evolutionary  pathways  of  tumors  exist 
over  time  (McGlynn  2002).  Even  with  the  many  tools  Phylip  has  to  offer,  some  of  its 
components  are  becoming  obsolete.  The  DNADIST  tool  is  not  obsolete,  and  is  still  in 
widespread  use. 

Calculations  of  richness  parameters  and  evenness  involving  large  sequences  such 
as  the  one  constructed  for  this  project,  become  complicated  very  fast;  therefore, 
algorithm-based  software  packages  that  perform  the  calculations  become  critical.  In 
2004,  a  program  called  DOTUR  was  introduced  to  overcome  some  of  the  limitations  of 
Phylip’s  obsolete  programs  (http://www.plantpath.wisc.edu/fac/joh/dotur.html).  DOTUR 
used  an  input  of  a  distance  matrix  created  by  Phylip  DNADIST  program,  and  assigned 
input  sequences  to  operational  taxonomic  units  (OTUs)  for  various  evolutionary  distance 
levels  using  different  clustering  algorithms.  OTUs  are  basic  groupings  determined  by 
sequence  similarity.  The  program  calculates  several  known  diversity  indices  and 
rarefaction  data  (Schloss  2005).  Several  studies  have  used  DOTUR  to  calculate  diversity 
parameters  for  data  (Francis  et  al.,  Sogin  et  al).  This  project  used  DOTUR  version  1 .53, 
executed  in  November  2007,  to  calculate  ACE  and  CHAO  1  estimators,  components 
needed  for  evenness  calculation,  and  rarefaction  data. 

DOTUR  can  use  several  methods  to  determine  sequence  similarities  and  to  group 
sequences  into  OTUs  according  to  evolutionary  distances.  The  first  method  is  referred  to 
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as  the  Nearest  Neighbor  method,  which  assumes  that  each  sequence  within  an  OTU  is  at 
most  X%  different  from  the  most  similar  sequence  in  the  group.  The  second  method  is 
referred  to  as  the  Furthest  Neighbor  method,  which  assumes  that  each  sequence  within  an 
OTU  is  at  most  X%  different  from  the  any  other  sequence  in  the  group.  As  the  distance 
is  increased  the  sequences  added  to  the  OTU  must  be  within  the  distance  from  all  other 
sequences  already  in  the  OTU.  The  last  method  that  DOTUR  uses  is  the  Average 
Neighbor  method,  which  is  an  average  of  the  other  two  methods.  The  DOTUR  manual 
recommends  the  Furthest  Neighbor  method  for  16S  rRNA  gene  analysis  (Schloss  2005). 
DOTUR  provides  23  output  fdes.  Each  file  provides  information  to  graph  rarefaction 
data,  diversity  estimators,  replicate  data,  or  other  classification  data  useful  to  researchers. 

As  previously  mentioned  DOTUR  groups  sequences  into  OTUs  based  on  their 
DNA  sequence.  There  exists  much  controversy  over  the  evolutionary  distance  levels  that 
coincide  with  the  species,  genus,  and  phylum  levels.  No  firm  cutoff  has  been  established. 
However,  several  prominent  researchers  have  proposed:  >97%  similarity  relates  to  the 
species  level,  >95%  relates  to  the  genus  level,  >90%  relates  to  the  family  level,  and 
>80%  relates  to  the  phylum  level  (Schloss  2005,  Bond  1995,  Everett  1999).  Therefore  if 
a  sequence  is  >97%  similar  to  another  sequence,  the  organisms  from  which  the  sequences 
originated  are  then  accepted  to  be  the  same  species.  This  project  uses  the  aforementioned 
cutoff  values  to  correlate  to  species  and  phylum  respectively. 

DOTUR  generates  outputs  that  enable  calculation  of  several  parameters  of 
interest.  As  mentioned  previously,  evenness  is  considered  the  relative  distribution  of 
individuals  among  certain  predefined  units,  such  as  a  species.  There  are  numerous  ways 
to  determine  evenness.  This  project  used  the  popular  Pielou  formula  for  evenness 
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calculation.  The  Pielou  formula  is  the  ratio  of  the  Shannon  index  and  the  maximum 
value  of  observed  OTUs  when  only  one  individual  occupies  each  OTU  (Kennedy  1995). 

Good’s  coverage  was  first  introduced  and  defined  by  I.J.  Good  in  1953  as  an 
indication  of  sampling  effort.  Good  defined  coverage  (C)  by  the  following  formula:  C= 

fl 

1--^:  (Good  1953).  N  is  defined  as  the  community  size  and  ni  is  defined  as  the  number 

of  phylotypes  appearing  only  once.  Kemp  and  Aller  described  Good’s  coverage  as  a 
“non-parametric  estimator  of  the  proportion  of  phylotypes  in  a  community  of  infinite  size 
that  would  be  represented  in  a  smaller  community”  (Kemp  2004).  This  parameter  is 
presented  as  a  percentage;  therefore,  the  higher  the  percentage,  the  higher  the  coverage, 
or  sampling  effort,  for  that  particular  community. 

DOTUR  also  produces  an  output  file  entitled  Rarefaction.  This  file  has  the 
rarefaction  data  for  various  evolutionary  distances.  A  rarefaction  curve  compares 
observed  richness,  or  number  of  OTUs,  with  sampling  effort.  The  data  results  from 
averaging  randomizations  of  the  observed  accumulation  curve  (Hughes  2001),  a  count  of 
the  number  of  OTUs  at  a  given  sampling  point.  Constructing  rarefaction  curves  for  the 
various  subgroups  provides  a  comparison  of  richness  that  was  easy  to  interpret.  DOTUR 
uses  10,000  randomizations  in  its  calculations.  The  data  can  then  be  graphed  for  further 
analysis  (Schloss  2005). 

A  non-parametric  estimator  was  defined  by  Chao  in  1984.  Chaol  estimates  the 
species  total  richness  by  the  formula: 
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SCHAOi  =  Sobs  +  — — ,  where  S0/K  is  the  number  of  observed  OTUs,  ni  is  the  number  of 
2  n2 

singletons,  or  OTUs  occurring  only  once,  and  112  is  the  number  of  doubletons,  or  OTUs 
occurring  twice  (Hughes  2001,  Schloss  2005,  Chao  1984).  This  estimator  is  particularly 
useful  when  data  sets  are  skewed  toward  the  low-abundance  classes,  as  they  are  likely  to 
be  in  microbial  communities  (Hughes  2001).  The  DOTUR  program  uses  the  above 
formula  to  calculate  the  Chao  1  file  only  when  ni=0  and  n2  >0.  However,  when  ni>0  and 

Yl  (fl  —  X) 

n2>0  and  when  ni=0  and  n2=0  DOTUR  uses  the  formula:  SCHAOl  =  Sobs  +  — — 1 - . 

2(//,  + 1) 

The  ACE  estimator  incorporates  data  from  all  OTUs  with  fewer  than  10 
individuals.  This  includes  more  than  just  the  singletons  and  doubletons.  The  ACE 
estimator  is  defined  by  DOTUR  as  the  formula: 

Sace  =  Sabund  +^  +  -^Y2ace  >  where  Cae=1--^l-  (coverage), 

^ ACE  ^ ACE  ^  rare 

10 

Z  *'(*■- O'1,- 

y2ACE  =  max[ - — - 1,0]  (coefficient  of  variation),  where 

C ACE  (NmJ(Nmre-  1) 

n,  is  the  number  of  OTUs  with  i  individuals,  Srare  is  the  number  of  OTUs  with  10  or  fewer 
individuals,  Sabund  is  the  number  of  OTUs  with  more  than  10  individuals  (Schloss  2005). 
Both  the  ACE  and  the  Chao  1  estimators  underestimate  true  richness  at  low  sample  sizes 
(Hughes  2001). 

Error 

DOTUR  calculates  not  only  the  parameters  but  also  a  95%  confidence  interval 
for  some  of  those  parameters.  Typically,  in  statistics  the  confidence  intervals  are  an 
equal  amount  both  above  and  below  the  estimated  mean  of  the  parameter.  DOTUR 
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values  tend  to  overestimate  the  high  confidence  range.  The  manual  does  not  address  this 
phenomenon.  However,  due  to  the  fact  that  the  majority  of  parameters  estimated  are 
proven  in  literature  to  be  underestimates  of  richness,  it  is  possible  the  DOTUR  creators 
put  more  emphasis  on  the  high  confidence  limit  to  get  a  more  realistic  range  of  the  true 
estimate  (Hughes  2001;  Kemp  &  Aller  2004).  Nevertheless,  the  error  introduced  by  the 
DOTUR  system,  where  provided,  was  used  throughout  all  the  subsequent  calculations. 
Error  bars  often  appear  figures  in  peer  reviewed  articles;  however,  their  interpretation  is 
often  incorrect.  In  this  case,  the  95%  confidence  intervals  are  used.  Therefore,  an 
overlap  of  more  than  half  an  error  bar  arm  from  one  data  set  to  the  next  indicates  the  data 
sets  are  not  significantly  different.  Any  overlap  of  less  than  half  of  an  error  bar  arm  or  no 
overlap  indicates  the  data  sets  are  statistically  different  (Cumming  2007). 

Another  phenomenon  typical  in  statistics  is  that  confidence  intervals  get  more 
refined  as  the  sample  size  increases.  This  is  due  to  the  fact  that  typically  confidence 
intervals  are  calculated  by  taking  the  ratio  of  variance  to  the  square  root  of  sample  size  as 
a  major  component  of  the  calculation.  A  set  of  data  usually  has  a  better  estimate  of 
variance  as  the  sample  size  increases  so  the  total  interval  will  decrease  (McClave  et  al. 
2008).  However,  in  microbial  analysis  the  variance  does  not  follow  this  typical  trend. 

For  instance,  in  this  research’s  data  the  total  population  was  so  diverse  that  the  sample 
size  was  inadequate  to  estimate  a  variance.  As  more  samples  were  taken,  the  variance 
also  increased  right  along  with  sample  size.  This  trend  was  seen  throughout  the  analysis. 
The  confidence  intervals  did  not  get  smaller  with  increased  sample  size.  This  again  was  a 
testament  of  the  vastness  of  the  diversity  in  microbial  communities. 
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Statistical  Analysis 

In  order  to  compare  microbial  communities  at  the  phylum  level,  as  they  were 
established  by  RDP,  Analysis  of  Similarity  (ANOSIM)  tests  were  used.  These  tests  use 
the  ecological  distances  among  untransformed  samples  from  the  data  represented  using 
Bray-Curtis  (Clarke  1993).  A  random  and  observed  test  statistic,  R,  was  generated  using 
Primer-E  v.  6.0.  Data  were  to  be  statistically  different  if  less  than  5%  of  the  generated 
test  statistics  were  less  than  the  observed  test  statistic.  This  method  has  recently  been 
applied  to  microbiological  studies  (Isenhouer  2007).  These  tests  allow  some  semblance 
of  statistical  integrity  into  studies  characterizing  microbial  community  composition. 
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Chapter  III:  Methodology 


Experimental  Overview 

Since  its  construction  in  2000,  many  research  projects  have  focused  on  the 
groundwater  treatment  wetland  at  WPAFB,  both  hydraulic  and  remediation  properties. 
This  specific  project  continued  the  research  of  Major  Ethan  Bishop,  who  provided  the 
experimental  foundation  summarized  in  the  next  section  (AFIT/GES/ENV/06J-01). 

In  2005,  mesocosms  were  constructed  at  Wright  State  University  from  soil  taken 
from  both  the  constructed  and  Valle  Green  wetlands  in  Beavercreek,  Ohio.  The 
constructed  wetland  had  already  shown  PCE  degradation;  therefore,  soil  from  the 
constructed  wetland  was  used  to  “inoculate”  the  soil  from  Valle  Green.  This  ensured  the 
soil  microbial  community  would  have  a  healthy  consortium  of  PCE  degraders,  since,  at 
the  time,  it  was  uncertain  whether  PCE  degraders  were  part  of  the  microbial  community 
of  Valle  Green.  Prior  to  the  construction  of  the  mesocosms,  samples  from  the  inoculated 
soil  were  taken  to  establish  baseline  data  for  the  microbial  community  prior  to  planting  of 
the  columns  or  PCE  exposure. 
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jver-flow 


Figure  1:  Mesocosm  Design 

All  measurements  in  inches 

Figure  1  illustrates  the  column  design  and  dimensions  for  the  mesocosms  (Bishop, 
2006).  Each  mesocosm  was  constructed  from  6-in  diameter  PVC  pipe  with  a  depth 
representative  of  the  actual  WPAFB  constructed  wetland.  Three  wetland  plants, 
Eleocharis  erythropoda  (Spike  Rush),  Carex  comosa  (Bearded  Sedge),  and  Scirpus 
atrovirens  (Green  Bulrush),  were  used  in  this  experiment.  A  single  species  was  planted 
in  each  mesocosm  in  an  effort  to  characterize  its  effects  on  its  associated  microbial 
community.  Three  control  mesocosms  were  also  established  for  comparison  of  microbial 
communities  that  developed  without  higher  plant  association. 


34 


Table  1:  Mesocosm  Plantings  (Bishop  2006) 


Mesocosm 

Species 

1 

Carex  comosa 

2 

Carex  comosa 

3 

Control 

4 

Eleocharis 

erythropoda 

5 

Scirpus  atrovirens 

6 

Scirpus  atrovirens 

7 

Eleocharis 

erythropoda 

8 

Control 

9 

Scirpus  atrovirens 

10 

Eleocharis 

erythropoda 

11 

Control 

12 

Scirpus  atrovirens 

After  the  plants  grew  for  2  months,  5  gram  soil  samples  were  taken  from  each 
mesocosms  at  each  of  three  separate  depths:  depth  1,  49  inches  (bottom  sample),  depth  2, 
31  inches  (middle  sample),  and  depth  3,  13  inches  (top  sample).  Root  mass  was  observed 
in  all  samples  demonstrating  that  the  plant  roots  had  extended  the  entire  length  of  the 
mesocosms  (Bishop  2006). 

DNA  was  extracted  from  the  36  soil  samples  using  the  Mo  Bio  PowerSoil™ 

DNA  Isolation  Kit  with  the  standard  protocol  (Appendix  C).  PCR  was  performed  with 
these  DNA  extracts  as  the  templates  to  amplify  the  16S  rRNA  genes.  Universal  primers, 
E8F  and  E533R,  were  used  for  PCR  because  they  are  both  very  sensitive  to  detection  of 
bacteria.  While  primer  E8F  has  a  slight  affinity  for  Archaea  and  primer  E533R  has  an 
affinity  for  both  Archaea  and  Eukarya,  these  two  universal  primers  are  specific  enough  to 
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bacteria  to  meet  the  goals  of  this  project  (Baker  2003).  The  PCR  protocol  and  conditions 
used  for  this  experiment  are  summarized  in  Appendix  A.  Of  the  PCR  products  generated, 
357  were  cloned  and  sequenced  during  the  course  of  the  Bishop  project.  The  original 
PCR  reactions  were  frozen  at  -20°C  for  future  research  (Bishop  2006). 

Nomenclature 

This  project  combined  data  from  Bishop’s  research  with  new  sequence  data  taken 
from  Bishop’s  original  PCR  reactions  that  had  been  stored  as  described  above. 

Therefore,  a  unique  nomenclature  was  required.  Bishop  labeled  all  his  soil  samples  with 
an  “A”  and  two  subsequent  numbers.  The  “A”  represented  August,  the  month  of  soil 
extraction;  1  st  number  depicted  the  column  number;  and  the  2nd  number  represented  the 
depth  of  the  sample.  During  the  course  of  generating  the  sequenced  data,  additional 
numbers  were  added  to  the  sample  name.  The  subsequent  numbering  represented  the 
cloning  reaction,  plate  number  and  colony  number  respectively. 

As  new  cloning  reactions  were  performed  for  this  project,  the  labeling  system  was 
adjusted  to  differentiate  the  Bishop  data  from  the  new  data.  The  first  letter  represented 
the  month  of  cloning  (Appendix  B).  The  next  letter  was  always  “L”,  illustrating  that  the 
cloning  reaction  was  performed  during  the  Leon  project.  The  number  after  the  “L”  was 
the  cloning  reaction.  This  project  performed  only  one  cloning  reaction  for  each  PCR 
tube,  therefore,  the  number  after  the  letter  “L”  was  always  1  for  all  the  new  data.  The 
subsequent  numbers  represented  the  plate  number  and  colony  number  respectively.  On 
average,  five  plates  were  used  for  each  cloning  reaction.  For  instance,  the  sample 
identified  as  Ju53.Ll.l.l  is  a  sample  that  was  cloned  in  the  month  of  June,  from  column 
5,  depth  3,  it  is  a  Leon  first  cloning,  and  it  was  the  first  colony  picked  from  plate  1.  The 
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detailed  nomenclature  was  crucial  to  this  project.  The  column  and  depth  a  particular 
sample  originated  from  was  used  throughout  the  analysis  of  all  the  data.  During  the 
sequencing  several  sample  names  had  to  be  adjusted  due  to  space  limitations  and 
procedural  criteria.  Therefore  the  original  nomenclature  was  not  entirely  preserved. 
However  each  sample  is  uniquely  identifiable,  and  the  column  number  and  depth  were 
always  evident. 

Laboratory  procedures 

PCR  amplifications  from  the  Bishop  project  were  frozen  and  stored  at  -20°C.  In 
January  of  2007,  Bishop’s  stored  PCR  products  were  used  for  additional  cloning  and 
DNA  sequencing.  The  cloning  was  executed  using  the  StrataClone™  PCR  Cloning  Kit 
(Stratagene,  La  Jolla,  CA;  Appendix  D). 

Four  to  five  plates  of  Luria-Bertani  (LB)  media,  supplemented  with  ampicillin 
(AMP),  were  used  for  each  cloning.  Each  plate  received  on  average  50  pi  of  the 
transformation  mixture.  LB  media  is  a  rich  medium  commonly  used  to  grow  E.  coli,  and 
1L  is  prepared  using  the  following  recipe  (Difco  Manual  1998): 

10.0  g  Tryptone 
5 . 0  g  Yeast  Extraction 
-  lO.OgNaCl 

Distilled  or  deonized  water,  used  to  fill  to  1  Liter 
Adjust  the  pH  to  7.5 
15.0  g  of  agar 

After  LB  media  was  thoroughly  mixed,  it  was  autoclaved  on  liquid  cycle  for  20  minutes 
at  15  psi  and  121°C.  Next,  the  mixture  was  placed  in  a  55  °C  water  bath  to  cool.  AMP 
was  added  to  a  final  concentration  of  50  pg/ml.  The  addition  of  AMP  to  the  media  was  a 
crucial  step  to  activate  the  selectable  marker  built  into  the  standard  cloning  kit.  Also  the 
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substrate  5-bromo-4-chloro-3-indolyl-P-D-galactopyranoside  (X-gal),  from  a  stock 
concentration  of  20  mg/ml  was  diluted  to  a  final  concentration  of  40  pg/ml  in  the  medium 
for  blue- white  screening  (Chaffin  1998).  The  purpose  of  AMP  and  X-gal  addition  is 
explained  in  a  later  section. 

The  plates  onto  which  transformations  from  the  Strataclone  kit  had  been  spread 
were  incubated  overnight  at  37°C.  100  white  colonies  from  each  transformation  were 
chosen  from  the  plates  and  aseptically  transferred  with  sterile  toothpicks  to  a  Falcon® 
tube  with  5  mL  of  LB  broth  with  AMP  (final  concentration  of  50  ug/ml).  AMP  in  this 
media  helped  maintain  selection  for  cells  that  received  a  plasmid.  Following  ~1  6  hour 
incubation  at  37°C  with  shaking  at  150-175  rpm,  the  Falcon®  tubes  were  centrifuged  at 
6,800  x  g  with  an  Avanti®  J-26  XPI  centrifuge  for  15  minutes  at  20°C.  Media  was 
poured  off,  the  tubes  were  blotted  on  paper  towels,  and  cell  pellets  were  used  for  plasmid 
isolation.  QIAgen’s  QIAprep  Spin  Miniprep  Kit  (QIAgen  Inc.,  Valencia,  CA)  was  used 
to  purify  and  isolate  plasmid  DNA.  The  QIAprep  Spin  Miniprep  Kit  Using  a  Micro 
centrifuge  protocol  was  used  for  this  procedure  (Appendix  E).  Throughout  the  process 
the  samples  were  labeled  uniquely. 

Quality  Check  for  Laboratory  Procedures 

During  the  laboratory  procedures  numerous  quality  checks  were  in  place.  The 
plasmids  and  competent  cells  used  in  the  Strataclone  kit  were  engineered  with  several 
verification  vehicles.  PCR  products  were  cloned  into  a  plasmid  which  would  replicate 
within  a  host  E.coli  cell.  The  intention  was  that  only  plasmids  within  a  cell  that  had  the 
PCR  product  inserted  into  them  would  be  able  to  replicate.  It  was  possible  that  the 
cloning  procedures  produced  plasmids,  and  ultimately  cells,  that  were  replicating  without 
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the  PCR  product  insert.  Therefore  blue/white  screening  and  a  selectable  marker  were 
used.  These  procedures  are  explained  below. 

AMP  is  an  antibiotic  used  to  prevent  contamination;  however,  that  was  not  its 
primary  purpose  in  this  procedure.  Cells  that  received  a  cloning  plasmid  were  resistant  to 
AMP  and  could  grow  uninhibited  on  the  LB+AMP  media.  Another  goal  was  to  only 
proceed  with  cells  that  received  a  plasmid  with  a  PCR  product  insert.  Blue-white 
screening  is  a  useful  tool  to  make  this  determination.  A  successful  cloning  disrupts  an 
enzyme  reaction  within  the  cell.  X-gal  is  colorless  modified  galactose  sugar,  and  is  the 
substrate  for  this  reaction  (Chaffin  1998,  Stratagene®  2007).  If  a  PCR  product  has  been 
inserted  into  the  functional  gene  encoding  the  enzyme,  the  XGAL  will  not  be  used  by  the 
cells,  and  the  resultant  colony  will  be  white  on  the  plate  (Messing  1977,  Stratagene® 
2007).  The  cells  that  do  use  the  XGAL,  indicating  that  they  carry  a  plasmid  with  no  PCR 
insert,  will  turn  blue.  The  white  colonies  were  removed  from  the  plate  and  placed  in  5  ml 
of  LB  broth  with  AMP.  The  AMP  here  maintains  the  selection  of  cells  that  have  the 
plasmid  because  it  is  possible  for  the  cells  to  lose  the  plasmid  during  growth. 

IfcoRl  Restriction  Enzyme  Digestion  and  Gel  Electrophoresis 

Once  the  plasmids  were  isolated,  quality  checks  were  run  on  selected  samples  to 
ensure  that  the  correct  plasmids  had  been  isolated  and  that  they  had  the  inserted  PCR 
products  prior  to  sequencing.  After  four  cloning  reactions  in  which  the  insertion  was 
100%  efficient,  this  particular  step  was  no  longer  performed,  to  expedite  the  sequencing 
process.  Isolated  plasmids  were  digested  with  the  restriction  enzyme  EcoRl,  which  cuts 
the  plasmid  at  sites  that  flank  the  PCR  insert.  Figure  2  below  illustrates  a  gel 
demonstrating  the  successful  separation  of  the  target  DNA.  The  PCR  insert  bands 
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migrate  to  approximately  the  500  bp  band,  while  the  plasmid  band  is  approximately  3.5 
kb.  The  variability  in  the  migration  of  the  PCR  band  in  the  different  lanes  was  expected 
since  the  organisms  may  have  a  range  of  ~450  bp  to  ~600  bp  inserts  (Woese  1987).  The 
protocol  used  for  the  restriction  digest  is  summarized  in  Appendix  F,  and  all  gels  are 
shown  in  Appendix  G. 


Figure  2:  Gel  from  Ap53.Ll 

Lane  l-Ap53.L1.5.6;  Lane  2-Ap53.LL3.18;  Lane  3-Ap53.LL3.14;  Lane  4-Ap53.LL3.10;  Lane  5: 
lOObp  ladder;  Lane  6-Ap53.L1.2.5;  Lane  7-Ap53.L1.3.2;  Lane  8-Ap53.L1.5.18;  Lane  9-Ap53.L1.5.14; 

Lane  10-Ap53.LL5.13 
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Sequencing  and  Trimming 


Through  the  quality  control  procedures  described  above,  it  was  evident  that  the 
cloning  and  plasmid  purification  protocol  worked  and  PCR  inserts  could  be  sequenced. 
Prior  to  sequencing,  DNA  concentrations  were  determined  because  both  facilities 
required  a  concentration  of  50  ng/pl  or  above  for  sequencing.  The  sample  DNA 
concentrations  were  determined  after  the  plasmid  purification  and  isolation  by  a 
nanodrop  system.  This  system  is  a  spectrometer  that  evaluates  samples  as  small  as  1  pi. 
The  DNA  samples  were  loaded  onto  the  nanodrop  machine  and  DNA  concentrations 
were  recorded  by  hand  for  the  sequencing  facilities.  Only  samples  that  fell  within  the 
desired  range  were  submitted  for  sequencing. 

Due  to  the  large  number  of  isolated  plasmids,  sequencing  was  handled  both  at  the 
WSU  Genomics  Laboratory  (EEEGL)  and  through  the  Ohio  State  University’s  (OSU) 
Plant-Microbe  Genomics  Facility  (PMGF).  The  EEEGL  used  a  Beckman-Coulter 
CEQ8000  Genetic  Analysis  System,  while  the  PMGF  utilized  an  Applied  Biosystems 
platform.  Both  facilities  used  the  M13F  primer  to  recognize  the  Strataclone  plasmid  in 
sequencing  reactions,  and  provided  output  data  in  FASTA  format.  Chromatograms  were 
also  included  for  the  data.  On  a  few  occasions,  samples  that  failed  to  sequence  at  the 
EEEGL  were  submitted  to  the  PMGF,  which  returned  positive  results  for  those  samples. 
This  prompted  a  closer  look  at  the  sequences  from  the  two  laboratories.  Although  both 
laboratories  produced  useable  sequences  for  analysis,  the  PMGF  yielded  readable 
sequence  output  for  99%  of  plasmids  submitted,  whereas  EEEGL  produced  usable 
sequences  an  average  of  90%  of  the  submissions.  Sequences  from  the  PMGF  were 
typically  longer  (over  600bp),  also. 
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A  thorough  quality  check  procedure  ensured  only  good  quality  sequence  data 
were  further  analyzed.  As  a  first  step,  all  sequences  less  than  300  base  pairs  (bp)  were 
automatically  omitted,  because  they  did  not  provide  a  large  enough  region  of  the  16S 
rRNA  gene  to  provide  valid  contribution  to  the  project.  During  identification  and 
deletion  of  sequences  with  less  than  300  bp,  sequences  with  numerous  N’s  or  repeated 
letters  were  identified  and  highlighted. 

Repeated  letters  in  sequences  indicated  possible  contamination  of  the  sample.  N’s 
appear  in  place  of  nucleotides  when  insufficient  evidence  was  picked  up  with  the 
sequences.  The  N’s  indicate  a  point  where  any  nucleotide  could  have  matched  the 
sequence  analysis.  Numerous  N’s  indicates  that  the  sample  was  not  concentrated  enough 
to  produce  a  valid  sequence  (Isenhouer  2008,  Servaites  2007).  A  qualitative  assessment 
of  these  sequence’s  chromatograms  was  performed  based  on  background  noise  and  peak 
height  and  spread.  This  step  helped  to  identify  samples  that  were  contaminated  or 
sequenced  at  low  concentrations  and  those  sequences  were  omitted.  An  example  of  this 
step  of  editing  is  summarized  in  Figure  3. 
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>Apl0-2.L1.1.01.B01_0705231 1NQ  768  14  768  CEQ 

GGAGTTGTTCACACGGGCCAGTGAGCGCGCTAATACGATCTCACTATAGGGCGAATTGGAGCTCCCGCGTTGCCACGCT 

ACT  AGAACTAGT  GGATCCCCCGGGTCTT  GC  AGC  AC  ATT  GTT  GGA  ATT  CGCCCTT  AGAGTTT  GAT  CCTGGCTC  AGAGT  GA 

ACGCT  GGCGGC  AGGCT  A  AC  AC  AT  GC  AAGT  CGA  ACGGC  AGC  AC  AGGGGAGCTT  GCCTNGGGT  GGCG  AGT  GGCGGACGG 

GTGAGGAATACATGGGAATCTACCCTGTCGTGGGGGATAACGTAGGGAAACTTACGCTAATACCGCATACGACCGAGA 

GTTGAAAGCGGCGGACCGAAGGCGTCACGCGACTGGATGAGCCCATGTCGGATTAGCTAGTTGGCGGGGTAAAGGCCC 

ACCAAGGCGACGATCCGTAGCTGGTCTGAGAGGATGATCAGCCACACTTGGAACTGAGACACGGTCCAAGACTCCTAC 

GGGAGGC  AGC  AGT  GGGGGAATATT  GGAC  AAT  GGGCGC  A  AGGT  AT  CCC  AGCC  AT  GCCGCGT  GGGT  GAA  AGAAGGCCTT 

CGGGTTGTAAAGCCTTTTTGTCCCGGAAAGAAAAGCACGGGATTAAATACCCTCGTGTGATGACGGTACCCGGAAGAA 

ATACGC  A  ACCGGCT  ACCTTT  CGT  GT  C  A  AGC  AGCCCCCGGTT  C  A  AA  AGGGCG  AAAAT  CCC  AC  A  AGTT  GGA  AT  ATTC  AAG 

GCCTAATCGGATAACCGTCGACCCTCGAGCGCGCGGGCCCGGTTACCAAGCCTTTTTGTTTCCCTT 

>SSA12.1.18 

GGCCAGTGAATTGTAATACGACTCTTCTTATAGGGCGAATGGGGCCCTCTAGATGCTGCTCGAGCGGCCGCCAGTGTGA 

TGGATATCTGCAGAATTCGCCCTTAGAGTTTGATCCTGGCTCAGGGGATGAACGCTAGCGGCAGGCTTAATACATGCAA 

GTCGTGGGGCAGCATGTCCCGCAGCAATGCGGGATGATGGCGACCGGCAAACGGGTGCGGAACACGTACACAACCTTC 

CTTTTAGTGGAGAATAGCCCAGGGAAACTTGGATTAATACTCCGTAACATATAAGAAGTGGCATCACTTTTATATTAAA 

GC  AGC  AAT  GCGCT  GGA  AG  AT  GGGT  GT  GCGGCT  GATT  AG  AT  AGTT  GGCGGGGT  A  ACGGCCC  ACC  A  AGT  CGACGAT  C  AGT 

AACTGGTGTGAGAGCACGACCAGTCACACGGGCACTGAGACACGGGCCCGACTCCTACGGGAGGCAGCAGTAAGGAA 

TATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGGAGGATGAAGGTCCTCTGGATTGTAAACTTCTTTTAT 

TT  GGGAGGA  A  AT  CC  ATTTTTT  CT  AA  A  AT  GGTT  GACGGTACC  AG  AT  GA  ATA  AGC  ACCGGCTA  ACTCT  GT  GCC  AGC  AGCCC 

CGGTCAAAGGGCGAATCCAGCACACTGGCGGCCGTTACTAGTGGATCGAGCTGGTACAAGCTGGCGTAATATGGCATG 

CTGTTTCGGTGTAATTGTATCGCTCCANTCCCACAACAACAGCCGAGCATAGGGTAAGCTGTGGT 

>SSA12.3.17 

TAAGCAAGCGCGGAGTGAAATTAGTAAATACGACTCACTATAGGGCGAATTGGGCCCTTCTAGATGCATGCTCGAGCG 
GCCCGCAGTGTGATGGATATCTGCAGAATTCGCCCTTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCGTACTT 
A  AC  AC  AT  GCA  AGT  CG  A  ACG  AG  A  A  A  AG  AG  ACTT  CGGT  CT  CCG  A  AGT  A  A  A  AGT  G 
>SSA12.1.23 

GCC  AGT  G  AATT  GT  AAT  ACG  ACT  C  ACT  AT  AGGGCG  AATT  GGGCCCCTCT  AG  AT  GC  AT  GCT  CGAGCGGCCGCC  AGT  GT  GAT 

GGATATCTGCAGAATTCGCCCTTTGACCGGGGCTGCTGGCACAGAGTTAGCCGTCTCTTCCTCTTGCGGTACTATCACTT 

GCTTGTTCCCCGCATGACAGGAGTTTACAACCCGAAGGCCTTCATCCTCCACGCGGCGTCGCTCCATCAGGGTTTCCCC 

C  ATT  GT  G  AAAA  ATTCTCGACTGCT  GCC  ACCCGT  AGGT  GTCTGGACCGTATCTC  AGTT  CC  AGT  GT  GGCT  GGTCGT  CCTCT  C 

AGACC  AGCT  ACCCGTC  AT  CGCC  AT  GGT  GGGCCGTTACCCCGCC  AT  CT  AGCTGATAGGCCGCGAGCTC  AT  C  AGG  AAGCG 

CATTGCTGCTTTGGCTTTTCCTCCAATCGAAGGATGGCCATATGCGGTATTAATTCGCCTTTCGGCGAGCTATCCCCCAC 

TTCCCGGCAGATTGCTCACGTGTTACGCACCCGTGCGCCACTGAACCAAGCCTGTATTGCTACAAACCTAGTCCGTTCG 

ACTTGCATGTCTTATCCACGCCGCCAGCGTTCGTTCTGAGCCAGGATCAAACTCTAAGGGCGAATCCAGCACACTGCGG 

GCGTACTAGTGGATCGAGCTCGGTACAGCTGCGTATCA 

>F1 1.L1.3.33.F07  070412218E  680  0  680  CEQ 

CGGCCAGTGAGCGCGCTAATACGACTCACTATAGGGCGAATTGGAGCTCCCGCGTGCCCGCTACTAGAACTAGTGGAT 
CCCCCGGGACT  GC  AGC  AAT  GGT  GGAATT  CGCCCTT  AG  AGTTT  GAT  CCTGGCTC  AG  ATT  G  AACGCTGGCGGC  AAAT  GGC  A 
TAATAAAAACAAACAAATGGACAAAAAAGNTACAGAAAAAACGGCNGAAAAAAAAGGGGGGGGGGGGGCAAAAAAC 
CACAAAAAAAAGGGTAAAAGGAAGGGTTGGGGCCGGAAAAAACGGGGGNGGGGTGGAAAGGTTAAAAAAAATTAAA 
ACAAAATTTTCCCCGGGGGGGAAAAAAAAAACCGGGGTTTTTTTGGGCCACACAACACCCCCACCCACAAAAAAAAAT 

GGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAACACACCACAAAAAAAAAAAAAAAAAAAAAACACCCCCCCCCCC 

CCCCCCCCCCACCCCTCACTTTTTTTTTTTTCTCCCCCCCCCCCCGCCGCGNGGGGGGGGGGGGGGAAAAAAAAAAAAA 

AAAGAAGGGGGGGGGGGGGTAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTATATATATT 

Figure  3:  Editing  Step  1 

Example  of  short  sequences  and  sequences  with  repeated  letters  and  N’s 


In  the  next  step,  sequences  were  analyzed  by  the  Ribosomal  Database  Project  II 
release  9.57  (RDP)  Classifier  system  to  determine  the  closest  match  to  known  16S  rRNA 
sequences  within  the  RDP  database.  Each  rRNA  query  sequence  was  assigned  to  a 
phylum  at  an  80%  confidence  match  to  a  sequence  within  the  database.  An  average  of 
0.5%  of  the  sequences  fell  into  an  Unclassified  Root  category  (Cole  et  al.  2007). 
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Unclassified  Root  refers  to  sequences  for  which  the  Classifier  cannot  identify  as  bacterial 
16S  genes.  They  could  have  been  non  16S  genes,  or  16S  genes  from  non  bacteria,  or 
sequences  of  low  quality  (RDP  Staff  2007).  The  Unclassified  Bacteria  category  referred 
to  any  sequence  that  was  identified  as  Bacteria  but  did  match  particular  phyla  with  a 
confidence  level  of  80%  or  better.  Pie  graphs  were  constructed  for  each  community 
based  on  the  RDP  Classifier  program  results. 

The  symbol  “-“after  a  sequence  in  the  assignment  detail  view  of  the  RDP 
Classifier  program  indicated  that  the  match  occurred  using  the  reverse  complement  of 
that  particular  sequence  (Cole  2007;  Wang  2007).  The  sequences  were  identified  and 
reverse  complemented  (RC)  using  the  Reverse  Complement  Program 
(http://www.bioinformatics.org/sms/rev_comp.html).  An  example  of  this  step  of  editing 
is  summarized  in  Figure  4.  This  was  done  so  that  the  sequences  would  be  in  the  proper 
orientation  (reading  5’  to  3’)  prior  to  the  steps  described  below,  which  were  a 
continuation  of  the  editing  and  trimming  quality  control  process. 
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Ribosoma I  Database  Project  II 


RDP  HOME  |  BROWSER  |  CLASSIFIER  |  LIBCOMPARE  |  SEQMATCH  |  PROBE  MATCH  |  TREE  |  myRDP  |  seqCART 


Classifier  ::  Assignment  Detail 


jpyRDP 


[  start  over  |  hierarchy  view  j  help  ] 


Classifier: 
Taxonomical  Hierarchy: 

Query  File: 
Query  Submit  Date: 


Naive  Bayesian  rRNA  Classifier  Version  2.0,  July  2007 
Taxonomic  Outline  of  the  Bacteria  and  Archaea,  release  7.8 

TueJan  29  09:41:20  EST  2008 


Lineage  (dick  to  return  to  particular  node): 

Root (3) 


Assignment  Detail  (for  Confidence  threshold:  80%):  |  download  as  text  file 

Apl0-2.L1.1.01.B01_07052311NQ  Root[100%]  Bacteria[100%]  Proteobacteria[100%]  Gammaproteobacteria[95%]  Xanthomonadales[92%] 

Xanthomonadaceae[92%]  Xylella[50%] 

SSA12.1.18  Root[100%]  Bacteria[100%]  Bacteroidetes[100%]  Sphingobacteria[100%]  Sphingobacteriales[100%]  Crenotrichaceae[95%]  Terrimonas 

[95%]  ~ 

SSA12.1.2V/Root[100%]  Bacteria[100%]  Verrucomicrobia[100%]  Verrucomicrobiae[100%]  Verrucomicrobiales[100%]  Xiphinematobacteriaceae[68%] 
Xiphinematofeacteriaceae_genera_incertae_sedis[64%] 


>Apl0-2.L1.1.01.B01_0705231 1NQ  768  14  768  CEQ 

GGAGTTGTTCACACGGGCCAGTGAGCGCGCTAATACGATCTCACTATAGGGCGAATTGGAGCTCCCGCGTTGCCACGCT 

ACTAGAACTAGTGGATCCCCCGGGTCTTGCAGCACATTGTTGGAATTCGCCCTTAGAGTTTGATCCTGGCTCAGAGTGA 

ACGCT  GGCGGC  AGGCT  A  AC  AC  AT  GC  AAGT  CGA  ACGGC  AGC  AC  AGGGGAGCTT  GCCTNGGGT  GGCG  AGT  GGCGGACGG 

GTGAGGAATACATGGGAATCTACCCTGTCGTGGGGGATAACGTAGGGAAACTTACGCTAATACCGCATACGACCGAGA 

GTTGAAAGCGGCGGACCGAAGGCGTCACGCGACTGGATGAGCCCATGTCGGATTAGCTAGTTGGCGGGGTAAAGGCCC 

ACCAAGGCGACGATCCGTAGCTGGTCTGAGAGGATGATCAGCCACACTTGGAACTGAGACACGGTCCAAGACTCCTAC 

GGGAGGC  AGC  AGT  GGGGGAATATT  GGAC  AATGGGCGC  A  AGGT  AT  CCC  AGCC  AT  GCCGCGT  GGGT  GAAAGAAGGCCTT 

CGGGTTGTAAAGCCTTTTTGTCCCGGAAAGAAAAGCACGGGATTAAATACCCTCGTGTGATGACGGTACCCGGAAGAA 

AT  ACGC  A  ACCGGCT  ACCTTT  CGT  GT  C  A  AGC  AGCCCCCGGTT  C  A  AA  AGGGCG  AAAAT  CCC  AC  A  AGTT  GG  A  AT  ATTC  AAG 

GCCTAATCGGATAACCGTCGACCCTCGAGCGCGCGGGCCCGGTTACCAAGCCTTTTTGTTTCCCTT 

>SSA12.1.18 

GGCCAGTGAATTGTAATACGACTCTTCTTATAGGGCGAATGGGGCCCTCTAGATGCTGCTCGAGCGGCCGCCAGTGTGA 

TGGATATCTGCAGAATTCGCCCTTAGAGTTTGATCCTGGCTCAGGGGATGAACGCTAGCGGCAGGCTTAATACATGCAA 

GTCGT  GGGGC  AGC  AT  GT  CCCGC  AGC  AAT  GCGGGAT  GAT  GGCGACCGGC  AAACGGGT  GCGGAAC  ACGTAC  AC  AACCTTC 

CTTTTAGTGGAGAATAGCCCAGGGAAACTTGGATTAATACTCCGTAACATATAAGAAGTGGCATCACTTTTATATTAAA 

GCAGCAATGCGCTGGAAGATGGGTGTGCGGCTGATTAGATAGTTGGCGGGGTAACGGCCCACCAAGTCGACGATCAGT 

AACTGGTGTGAGAGCACGACCAGTCACACGGGCACTGAGACACGGGCCCGACTCCTACGGGAGGCAGCAGTAAGGAA 

TATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGGAGGATGAAGGTCCTCTGGATTGTAAACTTCTTTTAT 

TTGGGAGGAAATCCATTTTTTCTAAAATGGTTGACGGTACCAGATGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCC 

CGGTCAAAGGGCGAATCCAGCACACTGGCGGCCGTTACTAGTGGATCGAGCTGGTACAAGCTGGCGTAATATGGCATG 

CTGTTTCGGTGTAATTGTATCGCTCCANTCCCACAACAACAGCCGAGCATAGGGTAAGCTGTGGT 

>SSA12.1.23(RC) 

TGATACGCAGCTGTACCGAGCTCGATCCACTAGTACGCCCGCAGTGTGCTGGATTCGCCCTTAGAGTTTGATCCTGGCT 

CAGAACGAACGCTGGCGGCGTGGATAAGACATGCAAGTCGAACGGACTAGGTTTGTAGCAATACAGGCTTGGTTCAGT 

GGCGCACGGGTGCGTAACACGTGAGCAATCTGCCGGGAAGTGGGGGATAGCTCGCCGAAAGGCGAATTAATACCGCAT 

ATGGCCATCCTTCGATTGGAGGAAAAGCCAAAGCAGCAATGCGCTTCCTGATGAGCTCGCGGCCTATCAGCTAGATGG 

CGGGGT  AACGGCCC  ACC  AT  GGCGAT  GACGGGT  AGCT  GGT  CT  GAGAGGACGACC  AGCC  AC  ACT  GGAACT  GAGATACGG 

TCCAGACACCTACGGGTGGCAGCAGTCGAGAATTTTTCACAATGGGGGAAACCCTGATGGAGCGACGCCGCGTGGAGG 

ATGAAGGCCTTCGGGTTGTAAACTCCTGTCATGCGGGGAACAAGCAAGTGATAGTACCGCAAGAGGAAGAGACGGCTA 

ACT  CT  GT  GCC  AGC  AGCCCCGGT  C  A  A  AGGGCGAATTCTGC  AGAT  AT  CC  ATC  AC  ACTGGCGGCCGCT  CGAGC  AT  GC  ATCT  A 

GAGGGGCCCAATTCGCCCTATAGTGAGTCGTATTACAATTCACTGGC 


Figure  4:  Editing  Step  2 

RDP  Classifier  program  assignment  detail  view  to  identify  RC  sequences 
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At  this  stage  sequences  could  still  have  plasmid,  primers,  and  EcoRl  restriction 
sites  sequences  still  embedded  in  them.  The  next  step  was  to  trim  the  sequences  to 
remove  these  irrelevant  pieces.  This  is  a  consequence  of  the  sequencing  reaction, 
whereby  the  DNA  extension  from  the  sequence  primer  could  proceed  past  the  PCR  insert 
of  interest,  and  into  the  flanking  EcoBl  restriction  sequences  and  further  plasmid 
sequences.  The  EcoBl  restriction  sites  provided  a  convenient  means  for  locating  these 
flanking  sequences,  as  were  the  sequences  of  the  original  primers  used  to  amplify  the  16S 
rRNA  gene.  Since  these  sequences  represented  something  other  than  the  actual  16S 
rRNA  sequences  that  were  needed  for  analyses,  it  was  important  they  were  trimmed 
away.  The  primers  and  restriction  sites  were  identified  by  the  Microsoft  Word  2003 
Word  Find  function  and  highlighted.  An  example  of  this  step  of  editing  is  summarized  in 
Figure  5. 
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>Apl0-2.L1.1.01.B01_0705231 1NQ  768  14  768  CEQ 

GGAGTTGTTCACACGGGCCAGTGAGCGCGCTAATACGATCTCACTATAGGGCGAATTGGAGCTCCCGCGTTGCCACGCT 

ACTAGAACTAGTGGATCCCCCGGGTCTTGCAGCACATTGTTGBlAffdGCCCTTAGAGTTTGATCCTGGCTCAGAGTGA 

ACGCT  GGCGGC  AGGCT  AAC  AC  AT  GC  AAGT  CGA  ACGGC  AGC  AC  AGGGGAGCTT  GCCTNGGGT  GGCG  AGT  GGCGGACGG 

GTGAGGAATACATGGGAATCTACCCTGTCGTGGGGGATAACGTAGGGAAACTTACGCTAATACCGCATACGACCGAGA 

GTTGAAAGCGGCGGACCGAAGGCGTCACGCGACTGGATGAGCCCATGTCGGATTAGCTAGTTGGCGGGGTAAAGGCCC 

ACCAAGGCGACGATCCGTAGCTGGTCTGAGAGGATGATCAGCCACACTTGGAACTGAGACACGGTCCAAGACTCCTAC 

GGGAGGC  AGC  AGT  GGGGGAATATT  GGAC  AAT  GGGCGC  A  AGGT  AT  CCC  AGCC  AT  GCCGCGT  GGGT  GAA  AGAAGGCCTT 

CGGGTTGTAAAGCCTTTTTGTCCCGGAAAGAAAAGCACGGGATTAAATACCCTCGTGTGATGACGGTACCCGGAAGAA 

ATACGC  A  ACCGGCT  ACCTTT  CGT  GT  C  A  AGC  AGCCCCCGGTT  C  A  AA  AGGGCG  AAAAT  CCC  AC  A  AGTT  GGA  AT  ATTC  AAG 

GCCTAATCGGATAACCGTCGACCCTCGAGCGCGCGGGCCCGGTTACCAAGCCTTTTTGTTTCCCTT 

>SSA12.1.18 

GGCCAGTGAATTGTAATACGACTCTTCTTATAGGGCGAATGGGGCCCTCTAGATGCTGCTCGAGCGGCCGCCAGTGTGA 
TGGATATCTGCA^BHHGCCCTTAGAGTTTGATCCTGGCTCAGGGGATGAACGCTAGCGGCAGGCTTAATACATGCAA 
GTCGTGGGGCAGCATGTCCCGCAGCAATGCGGGATGATGGCGACCGGCAAACGGGTGCGGAACACGTACACAACCTTC 
CTTTTAGTGGAGAATAGCCCAGGGAAACTTGGATTAATACTCCGTAACATATAAGAAGTGGCATCACTTTTATATTAAA 
GC  AGC  AAT  GCGCT  GGA  AG  AT  GGGT  GT  GCGGCT  GATT  AG  AT  AGTT  GGCGGGGT  A  ACGGCCC  ACC  A  AGT  CGACGAT  C  AGT 
AACTGGTGTGAGAGCACGACCAGTCACACGGGCACTGAGACACGGGCCCGACTCCTACGGGAGGCAGCAGTAAGGAA 
TATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGGAGGATGAAGGTCCTCTGGATTGTAAACTTCTTTTAT 
TTGGGAGGAAATCCATTTTTTCTAAAATGGTTGACGGTACCAGATGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCC 
CGGTCAAAGGGCGAATCCAGCACACTGGCGGCCGTTACTAGTGGATCGAGCTGGTACAAGCTGGCGTAATATGGCATG 
CTGTTTCGGTGTAATTGTATCGCTCCANTCCCACAACAACAGCCGAGCATAGGGTAAGCTGTGGT 

>SSA12.1.23(RC)  _ 

TGATACGCAGCTGTACCGAGCTCGATCCACTAGTACGCCCGCAGTGTGCTGGATTCGCCCTTAGAGTTTGATCCTGGCT 

cacSaacgaacgctggcggcgtggataagacatgcaagtcgaacggactaggtttgtagcaatacaggcttggttcagt 

GGCGCACGGGTGCGTAACACGTGAGCAATCTGCCGGGAAGTGGGGGATAGCTCGCCGAAAGGCGAATTAATACCGCAT 

ATGGCCATCCTTCGATTGGAGGAAAAGCCAAAGCAGCAATGCGCTTCCTGATGAGCTCGCGGCCTATCAGCTAGATGG 

CGGGGT  AACGGCCC  ACC  AT  GGCG  AT  GACGGGT  AGCT  GGT  CT  GAG  AGGACG  ACC  AGCC  AC  ACT  GG  AACT  GAG  ATACGG 

TCCAGACACCTACGGGTGGCAGCAGTCGAGAATTTTTCACAATGGGGGAAACCCTGATGGAGCGACGCCGCGTGGAGG 

ATGAAGGCCTTCGGGTTGTAAACTCCTGTCATGCGGGGAACAAGCAAGTGATAGTACCGCAAGAGGAAGAGACGGCTA 

ACTCTGTGCCAGCAGCCCCGGTCAAAGGGCMBIBtGCAGATATCCATCACACTGGCGGCCGCTCGAGCATGCATCTA 

G  AGGGGCCC  A  ATT  CGCCCT  AT  AGT  G  AGTCGT  ATT  AC  A  ATT  C  ACT  GGC 


Figure  5:  Editing  Step  3 

Identifying  primers  (yellow)  and  restriction  sites  (pink). 


The  sequences  are  then  uploaded  into  the  mega  Basic  Local  Alignment  Search 
Tool  (megaBlast)  to  determine  the  region  with  the  strongest  alignment  to  other  sequences 
in  the  BLAST  database.  The  Hit  Table  output  of  BLAST  lists  all  the  matches  to  a 
particular  sequence,  in  order  of  highest  alignment.  This  output  also  identified  the  regions 
of  alignment  for  each  match.  This  region  was  identified  in  all  sequences  (Altschul  1990). 
Typically,  this  region  fell  between  the  forward  and  reverse  primer  within  the  sequence; 
however,  at  times  the  region  fell  on  the  primer,  and  therefore  was  another  means  by 
which  we  could  recognize  and  remove  flanking  sequences  that  could  skew  final  analyses. 
The  program  compared  our  unknown  nucleotide  sequences  to  known  sequences  in  a 
database  with  over  61  million  sequences,  and  calculated  the  statistical  significance  of 
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matches  (National  Resource  for  Molecular  Biology  Information  2007).  Following  this 
final  step,  the  portion  of  the  sequence  before  and  after  the  primers,  restriction  sites  and 
the  BLAST  region  were  deleted.  This  left  only  the  -500  bp  16S  rRNA  insert  for  further 


analysis.  An  example  of  this  step  of  editing  is  summarized  in  Figure  6. 


BLAST 

Home 

Recent  Results 

Saved  Strategies 

Basic  Local  Alignment  Search  Tool 

Help 

fSion  In 

►  HCBIf  BLAST!  blastni  Formatting  Results  -TYFAYPYF01N  [Reformat  these  Results]  [Edit  and  Resubmit]  [Sign  in  above  to  save  your  search  strategy] 


Job  Title:  3  sequences  (Ap  1 0-2 11 .1.01  .B0 1_0705231 1 NQ . . . 


#  BLASTN  2.2.17  (Aug-26-2007 } 

#  Query:  AplO-2 . Ll . 1 . 01 .E01  07052311NQ  768  14  768  CEQ 

#  Database:  nr 

#  Fields:  query  idr  subject  ids,  %  identity,  alignment  length,  mismatches,  gap  opens,  q.  start,  g 

,  end,  s 

start,  s.  end,  evalue,  bit  score 

#  104  hits  found  ^ 

f  ^ 

r 

AplQ-2 .Ll . 1 . 01 .EQ1_07052311NQ  gi | 828S0242 | gb | DQ2S50S5 . 1 j  32.94  524  14  21  134  647  1  511  0.0  741 

AplQ-2 .Ll . 1 . 01 .B01  07052311NQ  qiil2596757lTcfbiEF392904.il  92.12  533  18  24  134  657  1  518  0.0  732 


>Apl0-2.L1.1.01.B01_0705231 1NQ  768  14  768  CEQ 

GGAGTTGTTCACACGGGCCAGTGAGCGCGCTAATACGATCTCACTATAGGGCGAATTGGAGCTCCCGCGTTG 
CCACGCTACTAGAACTAGTGGATCCCCCGGGTCTTGCAGCACATTGTTG|jAATTClGCCCTTAGAGTTTGATCCTGGCTC 
AGAGT  G  AACGCTGGCGGC  AGGCT  A  AC  AC  AT  GC  AAGTCG  A  ACGGC  AGC  AC  AGGGGAGCTT  GCCTNGGGT  GGCG  AGT  GG 
CGGACGGGTGAGGAATACATGGGAATCTACCCTGTCGTGGGGGATAACGTAGGGAAACTTACGCTAATACCGCATACG 
ACCGAGAGTTGAAAGCGGCGGACCGAAGGCGTCACGCGACTGGATGAGCCCATGTCGGATTAGCTAGTTGGCGGGGTA 
AAGGCCCACCAAGGCGACGATCCGTAGCTGGTCTGAGAGGATGATCAGCCACACTTGGAACTGAGACACGGTCCAAGA 
CTCCTACGGGAGGCAGCAGTGGGGGAATATTGGACAATGGGCGCAAGGTATCCCAGCCATGCCGCGTGGGTGAAAGA 
AGGCCTTCGGGTTGTAAAGCCTTTTTGTCCCGGAAAGAAAAGCACGGGATTAAATACCCTCGTGTGATGACGGTACCCG 
GAAGAAATACGCAACCGGCTACCTTTCGlGTCAAGCAGCCCCCGGTTCAAAAGGGCGAAAATCCCACAAGTTGGAATA 
TTCAAGGCCTAATCGGATAACCGTCGACCCTCGAGCGCGCGGGCCCGGTTACCAAGCCTTTTTGTTTCCCTT 

Figure  6:  Editing  Step  4 

Identifying  highest  alignment  region  using  megaBlast 

>Apl0-2.L1.1.01.B01_0705231 1NQ  768  14  768  CEQ 

AGT  G  AACGCT  GGCGGC  AGGCTA  AC  AC  AT  GC  A  AGTCGAACGGC  AGC  AC  AGGGGAGCTT  GCCTNGGGT  GGCG  AGT  GGCG 

GACGGGTGAGGAATACATGGGAATCTACCCTGTCGTGGGGGATAACGTAGGGAAACTTACGCTAATACCGCATACGAC 

CGAGAGTTGAAAGCGGCGGACCGAAGGCGTCACGCGACTGGATGAGCCCATGTCGGATTAGCTAGTTGGCGGGGTAAA 

GGCCC  ACC  A  AGGCG  ACG  ATCCGTAGCTGGTCT  GAGAGG  AT  GAT  C  AGCC  AC  ACTT  GG  AACT  GAG  AC  ACGGT  CCA  AG  ACT 

CCT  ACGGG  AGGC  AGC  AGT  GGGGGAATATT  GGAC  AAT  GGGCGC  AAGGTATCCC  AGCC  AT  GCCGCGT  GGGT  GAAAG  AAG 

GCCTTCGGGTTGTAAAGCCTTTTTGTCCCGGAAAGAAAAGCACGGGATTAAATACCCTCGTGTGATGACGGTACCCGGA 

AGAAATACGCAACCGGCTACCTTTCGT 

>SSA12.1.18 

GGGAT  G  AACGCTAGCGGC  AGGCTT  AAT  AC  AT  GCA  AGT  CGT  GGGGC  AGC  AT  GTCCCGC  AGC  AAT  GCGGGAT  GAT  GGCGA 

CCGGCAAACGGGTGCGGAACACGTACACAACCTTCCTTTTAGTGGAGAATAGCCCAGGGAAACTTGGATTAATACTCC 

GTAACATATAAGAAGTGGCATCACTTTTATATTAAAGCAGCAATGCGCTGGAAGATGGGTGTGCGGCTGATTAGATAG 

TT  GGCGGGGT  A  ACGGCCC  ACC  AAGT  CG  ACGAT  C  AGT  A  ACT  GGT  GT  GAG  AGC  ACGACC  AGT  C  AC  ACGGGC  ACTGAG  AC  A 

CGGGCCCGACTCCTACGGGAGGCAGCAGTAAGGAATATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCG 

>SSA12.1.23(RC) 

A  ACGAACGCT  GGCGGCGT  GGAT  AAGAC  AT  GCA  AGT  CG  A  ACGG  ACTAGGTTT  GT  AGC  A  AT  AC  AGGCTT  GGTT  C  AGT  GGC 
GC  ACGGGT  GCGT  AAC  ACGT  G  AGC  A  AT  CT  GCCGGGAAGT  GGGGGAT  AGCTCGCCG  AAAGGCG  AATT  AAT  ACCGC  ATAT  G 
GCC  AT  CCTTCGATT  GGAGGAA  AAGCC  AAAGC  AGC  AAT  GCGCTTCCT  GAT  GAGCTCGCGGCCT  AT  C  AGCT  AGAT  GGCGG 
GGTAACGGCCCACCATGGCGATGACGGGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGATACGGTCCA 
GAC  ACCTACGGGT  GGC  AGC  AGT  CGAG  AATTTTT  C  AC  AAT  GGGGGA  AACCCT  GAT  GGAGCG  ACGCCGCGT  GGAGG  AT  GA 
AGGCCTTCGGGTTGTAAACTCCTGTCATGCGGGGAACAAGCAAGTGATAGTACCGCAAGAGGAAGAGACGGCTAACTC 
T  GT  GCC  AGC  AGCCCC 


Figure  7:  Edited  and  Trimmed  Sequences 
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The  editing  process  outlined  above  was  a  crucial  portion  of  this  project.  The 
sequences  used  for  the  DOTUR  analysis,  must  have  met  all  the  criteria  mentioned  above. 
The  software  packages  do  not  verify  the  input  sequences  provided  to  it.  Therefore  the 
software  output  provided  must  be  validated  by  the  editing  process  applied  to  the  input. 
Figure  8  below  is  a  flow  chart  that  describes  the  procedures  the  raw  sequences  underwent 
and  the  various  analyses  performed. 
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RAW 


Assign  to  phyla  by 
RDP  Classifier 


2,820 


Input  to  Phylip  Distance  Matrix 


Good’s  Coverage/ 
Evenness 


Figure  8:  Schematic  of  Sequence  Analysis 
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Analysis 

The  3,099  sequences  remaining  after  trimming  and  editing  were  aligned  with  the 
RDP  release  9.57  aligner.  This  aligner  was  the  only  online  program  able  to  handle  the 
capacity  of  sequences  in  this  project.  The  data  were  separated  into  subsets  representing 
the  comparisons  needed  to  answer  the  research  questions.  Data  were  sorted  by  control 
and  planted  mesocosms,  by  plant  species,  and  by  depth.  These  groupings  of  sequences 
were  uploaded  to  the  aligner,  a  process  that  took  10  days  to  complete. 

The  RDP  Classifier  program  analysis  was  used  to  construct  pie  charts  in  Excel  to 
address  each  research  question.  The  pie  charts  divided  the  phyla  represented  in  each 
community  into  9  slices.  At  times,  phyla  with  low  representation  were  grouped  together 
in  order  to  make  the  graph  more  clear.  Each  of  the  pie  charts  also  had  a  summary  table 
for  each  phylum.  The  pie  charts  and  tables  are  summarized  in  Chapter  IV  under  their 
respective  research  questions.  To  verify  that  the  community  phyla  classifications  were 
statistically  different,  ANOSIM  was  performed  on  the  RDP  phylum  classifications.  If  the 
p  value  was  greater  than  .05,  then  the  two  communities  being  compared  could  not  be 
statistically  different. 

The  literature  review  presented  the  different  parameters  used  in  this  project.  The 
sequences  remaining  after  trimming  and  editing  were  used  to  calculate  richness 
parameters,  evenness,  and  Good’s  coverage.  However,  these  calculations  become 
complicated  with  such  a  large  number  of  sequences.  DOTUR,  the  program  used  to 
calculate  the  parameters,  required  a  distance  matrix  for  execution.  The  aligned  data  was 
downloaded  from  RDP  site  in  a  Phylip  format.  The  data  subsets  that  numbered  greater 
than  2,000  sequences  were  downloaded  by  the  RDP  staff  due  to  program  limitations. 
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This  Phylip  file  for  each  subset  of  the  data  was  used  as  an  input  file  for  the  Phylip  version 
3.2  DNADIST  program.  This  program  used  the  Jukes-Cantor  method  to  create  a  distance 
matrix.  This  distance  matrix  was  used  to  run  the  DOTUR  software. 

Once  the  distance  matrix  was  created,  the  file  was  saved  as  a  distance  file  in  the 
DOTUR  program.  This  distance  file  was  used  to  run  the  DOTUR  program.  23  files  of 
output  data  were  created  by  DOTUR  to  include  the  ACE,  CHAO  1,  and  rarefaction  data. 
These  files  were  used  to  create  graphs  and  perform  calculations  to  answer  the  research 
questions  of  this  project. 

DOTUR  constructs  *.c*  files  to  plot  collector’s  curves.  These  files  are  organized 
so  that  the  first  column  is  the  number  of  sequences  sampled.  The  next  three  columns  for 
each  evolutionary  distance  represented  the  mean  parameter  and  the  parameter’s  upper 
and  lower  95%  confidence  interval  bounds.  At  times,  a  confidence  interval  was  difficult 
to  define  so  a  zero  was  placed  in  that  particular  spot  (DOTUR  2005). 

Each  of  the  *.c*  files  for  the  parameters  used  in  this  project  were  used  to 
construct  collectors  curves  at  the  3%  evolutionary  distance  (species  level),  from  other 
sequences  within  the  samples,  and  the  20%  evolutionary  distance  (phylum  level),  from 
other  sequences  within  the  samples.  These  graphs  were  used  for  comparison,  and  were 
able  to  address  each  of  the  research  questions. 

As  previously  mentioned,  diversity  consists  of  two  parts:  richness  and  evenness. 
The  ACE,  CHAO  1 ,  and  rarefaction  data  from  DOTUR  were  used  to  construct  curves  to 
address  richness.  However,  evenness  was  calculated  by  a  simple  formula.  Evenness  is 
considered  the  relative  distribution  of  individuals  among  certain  predefined  units,  such  as 
species.  There  are  numerous  ways  to  determine  evenness.  This  project  used  the  most 
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popular  formula  for  evenness,  the  ratio  of  the  Shannon  index  and  the  maximum  value  of 
observed  OTUs  when  only  one  individual  occupies  each  OTU  (Kennedy  1995).  The 
Shannon  index  was  calculated  by  DOTUR.  This  was  located  in  the  Shannon  *ltt*  file. 
The  average  Shannon  index  for  the  3%  and  20%  evolutionary  distances  were  used  in  the 
evenness  calculations.  That  value  was  divided  by  the  LN(S),  which  is  the  total  number  of 
species  at  that  evolutionary  distance.  The  error  was  propagated  by  using  the  relative 
error  from  both  the  Shannon  index  and  the  S  value.  The  95%  upper  and  lower 
confidence  intervals  were  provided  by  DOTUR  (Schloss  &  Handelsman  2005). 

fl 

Good’s  coverage  was  determined  by  the  traditional  formula  C=  1  -  (Good 

1953).  N  was  defined  as  the  community  size  and  ni  was  defined  as  the  number  of 
phylotypes  appearing  only  once,  and  C  was  Good’s  coverage.  The  coverage  was 
calculated  for  each  plant  species,  depth,  control,  compiled  planted,  and  all  the  data. 
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Chapter  IV :  Results  and  Analysis 


Overview 

Data  for  all  similar  plant  species  were  pooled  to  construct  a  16S  rRNA 
community  for  each  comparison  of  interest:  planted  vs.  unplanted,  plant  species,  and 
depth  within  those  groups.  The  main  research  objectives  for  this  project  were  to 
determine  if  plant  presence,  plant  species,  or  depth  significantly  impacted  the  makeup  of 
the  microbial  community  composition  or  diversity  in  the  mesocosms.  Several  diversity 
parameters  were  used  to  answer  these  questions.  This  section  summarizes  the  diversity 
parameters  and  analyses,  and  the  outcomes  of  those  analyses.  This  section  begins  with  a 
general  look  at  the  diversity  of  the  all  of  the  sequence  samples,  and  then  is  organized  by 
research  question. 

The  sequences  fell  into  the  categories  summarized  in  Table  2,  once  all  similar 
mesocosms  were  grouped  together.  The  sequences  were  not  evenly  distributed  due  to  the 
uneven  planting  scheme,  wherein  there  were  four  columns  with  S.  atrovirens,  three  with 
E.  erythropoda,  two  with  C.  comosa,  and  three  unplanted  controls.  The  trimming  and 
editing  process,  described  in  Chapter  III,  left  3,099  sequences.  These  sequences  were 
assigned  to  phyla  by  the  RDP  Classifier  program  using  an  80%  match  to  sequences 
within  the  RDP  database.  Afterwards  RDP  alignment  was  executed,  a  total  of  2,820 
sequences  were  left  for  DOTUR  analysis.  263  (8.5%)  sequences  failed  to  align  due  to 
RDP  aligner  program  limitations  (RDP  staff  2007).  Another  0.5%  of  the  sequences  fell 
into  an  Unclassified  Root  category,  which  is  explained  later  in  this  section.  Neither  the 
sequences  which  failed  to  align  nor  the  Unclassified  Root  sequences  were  used  in  the 
DOTUR  analyses. 


54 


Table  2:  Sequence  Breakout 


Carex 

comosa 

Eleocharis 

erythropoda 

Scirpus 

atrovirens 

Control 

Total 

Sequences  after  trimming 
and  editing 

506 

756 

1076 

761 

3099 

Sequences  after 

Alignment 

471 

695 

959 

695 

2820 

It  was  immediately  evident  that  each  microbial  mesocosm  community,  even  the 
control  columns,  was  extremely  species-rich  in  diversity,  and  that  the  sequences  used  to 
characterize  this  community  came  from  just  a  small  sample  of  the  entire  community. 
Table  3  below  demonstrates  that  an  average  of  65%  of  all  the  sequences  appeared  only 
one  time  in  each  community  at  a  sequence  similarity  of  97%  (species  level),  and  Table  4 
shows  an  average  of  25%  appeared  only  one  time  at  a  sequence  similarity  of  80% 
(phylum  level). 


Table  3:  Frequency  Distribution  of  OTUs  at  97%  Similarity 


Number  of  OTUs  with  Nx  sequences 


Community 

Number  of 
Sequences 

Number  of 
unique  OTUs 

N, 

n2 

n3 

n4 

Ns 

N>5 

Scirpus 

atrovirens 

959 

657 

566 

56 

12 

8 

6 

9 

Carex 

comosa 

471 

381 

331 

37 

8 

1 

0 

4 

Eleocharis 

erythropoda 

695 

585 

510 

53 

15 

2 

4 

1 

Control 

695 

528 

442 

62 

13 

4 

2 

5 
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Table  4:  Frequency  Distribution  of  OTUs  at  80%  Similarity 


Number  of  OTUs  with  Nx  sequences 

Number  of  Number  of 


Community  Sequences  unique  OTUs  Nt  N2  N3  N4  N5  N>5 


Scirpus 

atrovirens 

959 

197 

68 

40 

29 

17 

10 

33 

Carex 

comosa 

471 

130 

51 

30 

12 

10 

4 

23 

Eleocharis 

erythropoda 

695 

190 

77 

35 

23 

16 

10 

29 

Control 

695 

178 

71 

39 

15 

13 

12 

28 

The  first  step  was  to  characterize  the  community  composition.  This  was 
performed  by  comparing  the  sample  sequences  to  the  RDP  database  of  known  sequences. 
Figure  9  depicts  the  various  phyla  the  3,099  sequences  fell  into  using  RDP  Classifier 
program.  This  figure  illustrates  the  community  composition  of  a  summation  of  all  the 
sequences.  This  summation  of  microbial  community  composition  across  the  mesocosms 
models  the  soil  of  the  constructed  wetland  at  WPAFB. 
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Proteo.  38.2% 


Figure  9:  Phyla  Classification  for  all  Data  using  RDP  Classifier 
Abbreviations:  Acido.,  Acidobacteria;  Actino.,  Actinobacteria;  Bader.,  Bacteroidetes;  Chloro., 
Chloroflexi;  Firm.,  Firmicutes;  Gemma.,  Gemmatimonadetes;  Lenti.,  Lentisphaerae;  Nitro.,  Nitrospira; 
Plant.,  Planctomycetes;  Proteo.,  Proteobacteria;  Spiro.,  Spirochaetes;  Unclass.,  Unclassified  Bacteria; 

Verr.,  Verrucomicrobia. 

Of  the  3,099  sequences  used  in  the  RDP  classifier  analysis,  99.48%  were 
identified  as  belonging  to  the  domain  Bacteria  with  1 8  different  distinct  phyla  and  an 
Unclassified  Bacteria  category.  The  remaining  0.52%  fell  into  an  Unclassified  Root 
category.  Unclassified  Root  refers  to  sequences  for  which  the  RDP  Classifier  Program 
could  not  determine  whether  they  were  bacterial  16S  rRNA.  These  may  have  been  non- 
16S  genes  or  rRNA  genes  from  non  bacteria  or  sequences  of  low  quality  (RDP  Staff 
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2007).  This  category  was  not  shown  on  any  of  the  pie  charts  in  this  section  and  was  also 
eliminated  from  DOTUR  analyses. 

28.4%  of  the  sequences  fell  into  the  Unclassified  Bacteria  category,  which  meant 
that  random  subsets  of  the  query  sequence  did  not  match  sequences  within  the  RDP 
database  greater  than  or  equal  to  80%  of  the  time.  The  remaining  sequences  were 
assigned  to  a  phylum.  The  largest  group,  38.2%,  was  Proteobacteria.  Although  phylum 
richness  was  high  with  19  different  phyla  represented,  the  abundance  was  not  even.  The 
prevalent  phyla  represented,  other  than  the  Proteobacteria,  were  Acidobacteria,  13.7%, 
and  Bacteroidetes,  4.7%.  It  is  important  to  mention  that  phyla  known  to  contain 
dehalogenators,  Chloroflexi  and  Firmicutes,  were  present  in  very  small  numbers. 

The  second  step  was  to  characterize  the  diversity  of  the  sample  sequences.  This 
analysis  was  performed  using  DOTUR,  where  the  sample  sequences  were  compared  to 
each  other.  A  rarefaction  curve,  the  ACE,  and  Chao  1  parameter,  were  calculated  for  the 
entire  data  set.  The  figures  for  the  species  and  phylum  levels  are  below.  The  species 
graph  did  not  reach  an  asymptote;  however,  the  phylum  level  graph  did  reach  an 
asymptote  for  the  ACE  and  Chao  1  estimators  and  the  rarefaction  curves.  The  lack  of  an 
asymptote  indicates  high  richness  and  that  the  total  population  was  undersampled.  It  was 
apparent  the  total  community  was  very  diverse,  and  that  the  community  as  a  whole  was 
probably  undersampled  in  this  project,  especially  at  the  species  level. 
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Figure  10:  All  Richness  Estimates  and  Rarefaction  Curves 
Ace  (diamonds)  and  Chao  (square)  richness  estimators  at  the  species  level  (A)  and  phylum  level  (B) 
for  all  the  data.  Rarefaction  values  (triangles)  based  on  observed  OTUs. 

Another  important  point  to  establish  was  that  the  sample  effort  was  adequate  to 
provide  quality  data  for  interpretation.  Good’s  coverage  was  calculated  for  each 
comparison.  Figure  11,  below,  summarizes  the  coverage  for  the  entire  data  set.  As 
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expected,  the  phylum-level  coverage  was  high  relative  to  the  species  coverage.  The 
phylum  coverage  averaged  92%;  therefore,  the  parameters  calculated  for  the  phylum 
level  come  from  a  population  that  had  been  sampled  at  a  high  level. 


All  AllPlanted  Blank  Eleo  Carex  Scirpus  Depth  1  Depth  2  Depth  3 

Libraries 


Figure  11:  Good’s  Coverage 

Light  green  bars  represent  the  phylum  level.  Blue  bars  represent  the  species  level. 

Evenness  was  also  an  important  aspect  that  was  investigated.  Figure  12  below 
summarizes  the  results  for  the  entire  data  set.  Evenness  was  calculated  with  the  Pielou 
equation  presented  in  Chapter  III.  The  error  bars  represent  the  propagated  error  for  each 
constituent  in  the  formula.  The  error  was  calculated  by  DOTUR. 
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Figure  12:  Evenness 

Species  level  (Blue)  and  the  phylum  level  (Light  green) 

Research  Objective  1:  Determine  the  effects  of  plant  presence  with  regards  to 
microbial  diversity  and  dominance 

The  first  step  was  using  the  RDP  classifier  function  to  identify  all  the  DNA  sequences 
that  could  be  matched  to  a  known  species  of  microorganism  within  the  RDP  database. 
This  characterized  the  community  composition  for  both  communities.  The  results  are 
summarized  in  Figure  13  below. 
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BRC1,  WS3, 


Figure  13:  Phyla  Classification  for  all  Control  sequences  (A)  and  all  Planted  sequences  (B)  using 

RDP  Classifier. 

Abbreviations:  Acidobacteria;  Actino .,  Actinobacteria;  Barter.,  Bacteroidetes;  Chloro .,  Chloroflexi; 

Firm.,  Firmicutes;  Gemma.,  Gemmatimonadetes;  Lenti.,  Lentisphaerae;  Nitro.,  Nitrospira;  Plant., 
Planctomycetes;  Proteo.,  Proteobacteria;  Spiro.,  Spirochaetes;  Unclass.,  Unclassified  Bacteria;  Verr., 

Verrucomicrobia. 
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The  charts  illustrate  that  the  microbial  composition  for  the  known  sequence 
matches  for  both  the  planted  and  control  data  are  very  similar  even  though  the  planted 
community  had  four  times  the  sequences  as  the  control.  Table  5  below  summarizes  the 
actual  percentage  of  each  phylum.  There  are  several  interesting  trends  that  can  be 
noticed  from  the  table.  Two  phyla  were  represented  in  the  control  community,  but  were 
not  found  in  the  planted  community.  The  phyla  TM7  and  Lentisphaerae  each  appear  one 
time.  Since  the  sequences  produced  during  this  experiment  are  representative  of  the 
dominant  phyla  within  the  soil  samples,  the  presence  of  one  individual  was  important  to 
document. 


Table  5:  Phyla  Classification  Percentages  (Control  vs.  Planted) 


Phyla 

Control 

Planted 

TM7 

0.13 

0 

OP  1 1  ,OP  1 0,OD  1  ,WS3  ,BRC  1 

1.18 

0.98 

Verrucomicrobia 

2.5 

2.95 

Firmicutes 

1.31 

1.32 

Spirochaetes 

0.26 

0.38 

Plantomycetes 

0.92 

0.86 

Bacteroidetes 

4.2 

4.79 

Lentisphaerae 

0.13 

0 

Actinobacteria 

2.5 

3.04 

Nitrospira 

1.18 

1.07 

Chloroflexi 

3.55 

3.04 

Acidobacteria 

16.16 

12.87 

Proteobacteria 

34.95 

39.35 

Gemmatimonadetes 

1.05 

0.64 

Unclassified  Bacteria 

29.7 

28.06 
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In  order  to  understand  whether  microbial  community  composition  differed 
statistically,  we  analyzed  the  RDP  Classifier  data  using  ANOSIM.  Analysis  revealed  no 
significant  differences  between  the  planted  and  control  data,  (n=5000  permutations; 
p=0.75).  The  outcome  was  the  same  when  unclassified  sequences  were  dropped  from  the 
phylum  level  analysis. 

The  second  step  was  to  characterize  diversity  using  DOTUR  analysis.  Evenness 
was  summarized  in  Figure  12.  There  was  high  evenness  for  both  communities  for  the 
phylum  and  species  level.  This  combined  with  the  fact  that  the  Good’s  coverage  at  the 
species  level  was  low,  indicated  that  the  species  level  was  vastly  undersampled. 

However,  the  phylum  level  Good’s  coverage  was  high  which  indicated  that  the  sampling 
effort  was  adequate  enough  to  make  a  confident  assessment  at  this  level.  Richness 
parameters  used  for  analysis  in  this  project  had  some  differences  between  the  planted  and 
control  communities.  Figure  14  shows  estimates  of  richness  at  the  3%  distance  level  for 
species  (A)  and  20%  distance  level  for  phylum(B). 
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Figure  14:  Control  and  Planted  Data  Richness  Estimates  and  Rarefaction  Curves 
Chao  (diamonds)  and  ACE  (squares)  richness  estimators  at  the  species  (A)  and  the  phylum  level  (B) 
for  Control  (top)  and  Planted  (bottom)  data.  Rarefaction  values  (triangles)  based  on  observed  OTUs. 

From  these  graphs  we  can  make  some  important  observations.  The  planted 
sequences  had  a  much  higher  richness  estimate  than  the  controls.  The  Chao  1  and  ACE 
estimators  were  4500  units  higher  in  the  planted  sequences  at  the  species  level.  However, 
the  species  level  graphs  for  all  three  richness  parameters  never  reached  an  asymptote. 

This  again  shows  us  that  the  species  level  was  undersampled.  In  the  phylum  graphs  the 
Chao  1  and  ACE  estimators  were  somewhat  closer  for  the  planted  and  control 
communities,  and  both  the  estimators  and  the  rarefaction  curve  did  asymptote.  The 
planted  community  was  still  much  higher  than  the  control;  however,  this  could  be  due  the 
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sample  size  for  the  planted  data,  1430  sequences  higher  than  the  control.  Therefore,  a 
look  at  the  rarefaction  curves  for  695  random  sequences  for  each  group  was  warranted. 


Figure  15:  Phylum  Level  Rarefaction  Curve  for  Control  and  Planted  Data 
Planted  (diamonds)  and  Control  (squares)  rarefaction  values  based  on  observed  OTUs  at  the  phylum 

level. 


In  the  phylum  level  analysis,  the  rarefaction  curve  does  approach  an  asymptote 
for  both  data  sets.  This  indicates  that  the  sampling  effort  was  adequate  to  make  a  clear 
and  good  estimate  of  richness  at  the  phylum  level.  The  planted  sequences  had  a  higher 
richness  than  the  control  data,  even  when  a  random  695  sequences  were  taken  for  both 
the  planted  and  control  communities.  Also  the  error  bars  here  show  that  at  the  lower 
sample  size  of  less  than  350  the  communities  are  not  statistically  different  because  they 
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overlap.  But  as  the  sample  size  increase  above  350  sequences,  the  error  bars  do  not 
overlap  and  the  richness  values  are  statistically  different.  This  analysis  clearly  shows  that 
while  the  microbial  community  composition  of  known  microorganisms  at  the  phylum 
level  did  not  change  for  the  planted  versus  the  control  libraries,  the  richness  was  affected 
by  plant  presence  at  the  phylum  level. 

Another  trend  seen  here  was  that  the  confidence  intervals  did  not  get  smaller  as 
the  sample  size  increased,  as  expected  from  typical  statistic  trends.  This  indicates  that 
with  increased  sample  size  the  variance  of  the  data  also  increases.  This  phenomenon 
indicates  that  the  communities  are  extremely  rich,  so  that  a  true  estimate  of  variance  can 
never  be  made. 

Research  Objective  2:  Determine  the  effects  of  plant  species  with  regards  to 
microbial  diversity  and  community  composition 

The  first  step  was  using  the  RDP  classifier  function  to  identify  all  the  DNA  sequences 
that  could  be  matched  to  a  known  species  of  microorganism  within  the  RDP  database. 
This  characterized  the  community  composition  for  the  plant  species  communities.  The 
results  are  summarized  in  Figure  16  below. 
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Figure  16:  Phyla  Classification  for  all  Scirpus  atrovirens  sequences  (A),  all  Car  ex  comosa  sequences 
(B),  and  all  Eleocharis  erthyropoda  sequences  (C)  using  RDP  Classifier 
Abbreviations:  Acidobacteria;  Actino .,  Actinobacteria;  Bacter .,  Bacteroidetes;  Chloro .,  Chloroflexi; 

Firm.,  Firmicutes;  Gemma.,  Gemmatimonadetes;  Lenti.,  Lentisphaerae;  Nitro.,  Nitrospira;  Plant., 
Planctomycetes;  Proteo.,  Proteobacteria;  Spiro.,  Spirochaetes;  Unclass.,  Unclassified  Bacteria;  Verr., 

Verrucomicrobia. 
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Table  6:  Phyla  Classification  Percentages  for  Scirpus  atrovirens  sequences  (A), 
Carex  comosa  sequences  (B),  and  Eleocharis  erythropoda  sequences  (C) 


Phyla 

Carex 

Eleocharis 

Scirpus 

TM7 

0 

0 

0 

OP  1 1  ,OP  1 0,OD  1  ,WS3,BRC  1 

0.59 

0.92 

1.21 

Verrucom  icrobia 

2.37 

4.36 

2.23 

Firmicutes 

3.36 

1.19 

0.46 

Spirochaetes 

0.2 

0.53 

0.37 

Planctomycetes 

0.4 

0.93 

1.02 

Bacteroidetes 

4.35 

4.63 

5.2 

Lentisphaerae 

0 

0 

0 

Actinobacteria 

3.16 

2.51 

3.34 

Nitrospira 

1.38 

1.32 

0.74 

Chloroflexi 

2.77 

3.57 

2.79 

Acidobacteria 

17.19 

13.36 

10.5 

Proteobacteria 

33.4 

33.99 

45.91 

Gemmatimonadetes 

0.79 

0.53 

0.65 

Unclassified  Bacteria 

29.44 

31.22 

25.18 

The  purpose  of  this  analysis  was  to  note  any  changes  in  microbial  composition 
between  the  different  species  of  plants  at  the  phylum  level.  Although  the  composition 
was  very  similar,  there  were  some  slight  differences.  The  phylum  Firmicutes  represents 
3.4%  of  the  sequences  of  the  Carex  comosa  mesocosm  samples,  but  only  0.46%  and 
1.2%  of  the  Scirpus  atrovirens  and  Eleocharis  erythropoda  communities,  respectively. 
Firmicutes  is  a  phylum  known  to  contain  dehalogenators.  Since  this  mesocosm  study 
mimics  a  constructed  wetland  treating  a  PCE  and  TCE  plume,  the  presence  of 
dehalogenators  was  expected. 

The  only  other  differences  were  with  the  phyla  Verrucomicrobia  and 
Proteobacteria.  The  Eleocharis  erythropoda  mesocosm  samples  had  a  4.4% 
representation  of  Verrucomicrobia  while  the  Carex  comosa  and  Scirpus  atrovirens 
samples  had  2.4%  and  2.2%  respectively.  The  most  prevalent  phylum  in  all  the  plant 
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species  was  Proteobacteria.  However,  Scirpus  atrovirens  had  45.9%  representation 
while  the  other  two  species  of  plant  had  only  an  average  of  33.7%  representation. 

In  order  to  understand  whether  microbial  community  composition  differed 
statistically,  we  analyzed  the  RDP  Classifier  data  using  ANOSIM.  Analysis  revealed  no 
significant  differences  between  the  plant  species  data,  (n=5000  permutations;  p=0.21). 
The  outcome  was  the  same  when  unclassified  sequences  were  dropped  from  the  analysis. 

The  second  step  was  to  characterize  diversity  using  DOTUR  analysis.  Evenness 
was  summarized  in  Figure  12.  There  was  high  evenness  for  all  the  communities  for  the 
phylum  and  species  level.  This  combined  with  the  fact  that  the  Good’s  coverage  at  the 
species  level  was  low,  indicates  that  the  species  level  was  vastly  undersampled. 

However,  the  phylum  level  Good’s  coverage  was  high  which  indicated  that  the  sampling 
effort  was  adequate  enough  to  make  a  confident  assessment  at  this  level.  Richness 
parameters  used  for  analysis  in  this  project  had  some  differences  between  the  plant 
species  communities.  Figure  17  shows  estimates  of  richness  at  the  3%  distance  level  for 
species  (A)  and  20%  distance  level  for  phylum(B). 
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Figure  17:  Plant  Species  Data  Richness  Estimates  and  Rarefaction  Curves 
Chao  (squares)  and  ACE  (diamonds)  richness  estimators  at  the  species  (A)  and  the  phylum  level  (B) 
for  Eleocharis  erythropoda  (top),  Carex  comosa  (middle),  and  Scirpus  atrovirens  (bottom)  data. 
Rarefaction  values  (triangles)  based  on  observed  OTUs. 

The  richness  estimators  showed  some  interesting  trends.  The  ACE  estimator 

predicted  the  highest  richness  in  all  cases,  while  the  observed  richness  (as  show  by  the 

rarefaction  curves)  was  always  well  below  either  estimator.  This  is  because  the 
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rarefaction  curve  illustrated  the  real  richness  present  in  the  samples.  The  ACE  and 
CHAO  1  estimators  estimate  the  true  richness  in  the  community  that  was  sampled.  The 
Scirpus  atrovirens  community  had  a  much  higher  ACE  and  CHAO  1  estimate  than  the 
other  two  communities.  This  shows  that  more  OTUs  were  identified  in  this  community. 
These  richness  estimators  had  a  slight  difference  in  their  values,  suggesting  plant  species 
had  an  effect  on  microbial  richness  in  the  mesocosms.  Eleocharis  erythropoda  had  the 
second  highest  richness,  while  Carex  comosa  had  the  lowest  richness  of  the  species  of 
plant.  However,  the  estimators  did  vary  with  sampling  effort  and  were  not  vastly 
different  from  each  other.  This  was  expected  because  all  the  species  of  plant  used  in  this 
project  were  from  the  same  family  and  had  the  same  growth  habit. 
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Figure  18:  Phylum  Level  Plant  Species  Data  Rarefaction  Curve 
Rarefaction  values  based  on  observed  OTUs.  Eleocharis  erythropoda  (squares)  and  Carex  comosa 
(diamonds)  Scirpus  atrovirens  (triangles)  at  phylum  level. 

In  Figure  17,  it  was  important  to  notice  that  the  species  level  rarefaction  data 
never  reached  an  asymptote,  indicating  undersampling  of  the  total  population.  However, 
the  phylum  level  rarefaction  data  did  reach  an  asymptote  for  each  of  the  plant  species. 
Figure  18  summarized  the  phylum  rarefaction  data  calculated  by  DOTUR  from  the 
samples  taken.  The  Eleocharis  erythropoda  data  has  the  highest  richness  followed  by 
Scirpus  atrovirens.  Carex  comosa  had  the  lowest  richness.  The  error  bars  on  this  figure 
represent  the  95%  confidence  interval.  The  error  bars  for  all  three  plant  species  overlap, 
except  the  Eleocharis  and  Carex  communities.  Therefore,  the  Eleocharis  and  Carex 
communities  have  a  difference  in  phylum  richness.  The  communities  are  not  sampled 
evenly  but  the  trend,  illustrated  in  Figure  18,  seems  to  be  that  there  was  less  overlap  as 
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the  sample  size  increased.  This  could  indicate  that  the  plant  species  do  have  a  richness 
difference  at  higher  sample  sizes. 

The  unexpected  trend  of  stable  confidence  intervals  with  increasing  sample  size, 
previously  discussed  in  Research  Objective  2,  was  also  seen  here.  This  indicates  that  the 
true  richness  of  these  communities  is  extremely  high.  The  sample  size  used  here  was  not 
sufficiently  large  to  establish  a  consistent  estimate  of  variance. 

Research  Objective  3:  Determine  the  effects  of  soil  depth  with  regards  to  microbial 
diversity  and  community  composition 

The  first  step  was  using  the  RDP  classifier  function  to  identify  all  the  DNA  sequences 
that  could  be  matched  to  a  known  species  of  microorganism  within  the  RDP  database. 
This  characterized  the  community  composition  for  the  depth  communities.  The  results 
are  summarized  in  Figure  19  below. 
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Figure  19:  Phyla  Classification  for  all  Depth  1  sequences  (A),  all  Depth  2  sequences  (B),  and  all 

Depth  3  sequences  (C)  using  RDP  Classifier 

Abbreviations:  Acidobacteria ;  Actino .,  Actinobacteria;  Bader.,  Baderoidetes;  Chloro.,  Chloroflexi; 

Firm.,  Firmicutes;  Gemma.,  Gemmatimonadetes ;  Lenti.,  Lentisphaerae;  Nitro.,  Nitrospira;  Plant., 
Plandomycetes;  Proteo.,  Proteobaderia;  Spiro.,  Spirochaetes;  Unclass.,  Unclassified  Bacteria;  Verr., 

Verrucomicrobia. 
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Table  7:  Phyla  Classification  Percentages 


Phyla 

Depth 

1 

Depth 

2 

Depth 

3 

TM7 

0 

0.11 

0 

OP  1 1  ,OP  1 0,OD  1  ,WS3,BRC  1 

1.09 

1.14 

0.98 

Verrucomicrobia 

1.81 

2.86 

3.8 

Firmicutes 

1.72 

0.69 

1.43 

Spirochaetes 

0.45 

0.23 

0.36 

Planctomycetes 

0.90 

0.92 

0.80 

Bacteroidetes 

4.62 

4.69 

4.73 

Lentisphaerae 

0 

0.12 

0 

Actinobacteria 

2.54 

2.97 

3.21 

Nitrospira 

0.90 

1.49 

1.07 

Chloroflexi 

4.35 

2.40 

2.59 

Acidobacteria 

11.78 

14.07 

15.25 

Proteobacteria 

38.50 

37.99 

38.27 

Gemmatimonadetes 

0.54 

0.57 

1.07 

Unclassified  Bacteria 

30.34 

29.29 

25.96 

The  purpose  of  this  analysis  was  to  investigate  whether  there  were  differences 
between  microbial  community  compositions  between  the  different  depths.  Although  the 
composition  was  very  similar  there  were  some  slight  differences.  Depth  1  correlates  to 
the  bottom  of  the  mesocosm.  The  Chloroflexi  population  represents  4.4%  of  the  Depth  1 
samples  taken.  The  middle  and  top  depth,  Depth  2  and  Depth  3  respectively,  were  both 
around  2.5%.  This  could  indicate  that  the  bottom  layers  of  the  mesocosms  are  richer  in 
Chloroflexi.  It  is  also  important  to  mention  that  the  prevalent  phylum  in  all  the  depths 
was  Proteobacteria ,  an  average  of  38.2%,  and  the  Unclassified  Bacteria  made  up  an 
average  of  28.5%  in  all  the  depth  communities. 

In  order  to  understand  whether  there  were  statistically  significant  differences  in 
microbial  community  composition,  we  analyzed  the  RDP  Classifier  data  using  ANOSIM. 
Analysis  revealed  no  significant  differences  among  depths,  (n=5000  permutations; 
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p=0.31).  The  outcome  was  the  same  when  unclassified  sequences  were  dropped  from  the 
analysis. 

The  second  step  was  to  characterize  diversity  using  DOTUR  analysis.  Evenness 
was  summarized  in  Figure  12.  There  was  high  evenness  for  all  three  communities  for  the 
phylum  and  species  level.  This  combined  with  the  fact  that  the  Good’s  coverage  at  the 
species  level  was  low,  indicates  that  the  species  level  was  vastly  undersampled. 

However,  the  phylum  level  Good’s  coverage  was  high  which  indicates  that  the  sampling 
effort  was  adequate  enough  to  make  a  confident  assessment  at  this  level.  Richness 
parameters  used  for  analysis  in  this  project  had  some  differences  between  the  plant 
species  communities.  Figure  20  shows  estimates  of  richness  at  the  3%  distance  level  for 
species  (A)  and  20%  distance  level  for  phylum(B). 
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Figure  20:  Depth  Data  Richness  Estimates  and  Rarefaction  Curves 
Chao  (squares)  and  ACE  (diamonds)  richness  estimators  at  the  species  (A)  and  the  phylum  level  (B) 
for  Depth  1  (top),  Depth  2  (middle),  and  Depth  3  (bottom)  data.  Rarefaction  values  (triangles) 

based  on  observed  OTUs. 
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The  richness  estimators  summarized  in  these  graphs  showed  some  slight  trends 
but  no  strong  evidence  that  the  depths  were  different  in  diversity.  The  middle  depth  was 
slightly  lower  in  both  the  ACE  and  Chao  1  estimators.  This  indicates  that  the  middle 
depth  had  lower  species  richness  than  the  top  and  bottom  layers.  In  Figure  20,  it  was 
important  to  notice  that  the  species  level  rarefaction  curve  never  reached  an  asymptote 
indicating  undersampling  of  the  total  population.  However,  the  phylum  level  rarefaction 
data  did  reach  an  asymptote  for  each  of  the  plant  species.  A  closer  look  at  the  phylum 
level  rarefaction  data  below,  in  Figure  21,  uncovers  an  interesting  trend. 
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Figure  21:  Community  Phylum  Level  Depth  Rarefaction  Curves 
Rarefaction  values  based  on  observed  OTUs  at  phylum  level.  Middle  depth  (squares),  Bottom  depth 
(diamonds),  and  Top  depth  (triangles).  Carex  comosa  (A);  Eleocharis  erythropoda  (B);  Scirpus 

atrovirens  (C);  Control  (D). 

The  middle  depth  in  all  libraries  reached  an  asymptote  at  a  lower  value.  This  indicates 
that  the  middle  depth  had  lower  diversity  than  both  the  top  and  bottom  layers  at  the 
phylum  level.  The  error  bars  on  these  curves  represent  the  95%  confidence  interval 
calculated  by  DOTUR.  The  error  bars  all  overlap  more  than  50%  except  in  the  Carex 
and  control  communities.  This  indicates  that,  for  these  two  communities,  the  middle 
layer  was  significantly  different  in  richness  than  the  other  two  layers.  However,  as 
sampling  effort  increased  the  layers  in  all  communities  did  start  to  split  apart.  This  trend 
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indicates  that  the  middle  layer  of  all  the  communities  was  lower  and  this  trend  should  be 
investigated  in  future  research.  Also  the  trend  previously  mentioned  in  the  first  two 
objectives  of  stable  confidence  intervals  also  applied.  The  intervals  did  not  get  smaller 
with  increased  sampling  effort.  This  indicated  that  the  total  population  was  extremely 
diverse  and  a  much  larger  sample  size  would  have  to  be  taken.  All  richness  estimator  and 
rarefaction  curves  are  included  in  Appendix  G. 
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Chapter  V :  Conclusions  and  Recommendations 


Overview 

This  chapter  summarizes  the  conclusions  and  recommendations  from  this 
research.  All  three  research  objectives  are  reviewed  and  the  conclusions  for  each  are 
discussed.  Also  this  chapter  reviews  the  significance  of  this  research  and  the  contribution 
it  made  to  the  literature  in  this  area.  This  chapter  ends  with  recommendations  for  further 
research. 

This  research  focused  on  characterizing  the  microbial  community  composition 
and  diversity  for  soil  communities  in  constructed  mesocosms  prior  to  contamination  of 
PCE.  The  mesocosm  construction  was  based  on  a  subsurface  flow  wetland  remediating  a 
PCE  and  TCE  plume  on  WPAFB,  OH,  but  the  mesocosms  were  built  with 
uncontaminated  soil.  Evidence  had  already  shown  that  the  wetland  was  remediating  the 
groundwater  plume  (Amon  2007).  Therefore,  it  was  expected  that  phyla  containing 
known  dehalogenators  would  be  represented  in  the  non-contaminated  sample  sequences. 
Dehalogenators  and  other  anaerobic  organisms  facilitate  the  first  stage  of  PCE  and  TCE 
remediation. 

From  the  3,099  sample  sequences  used  for  RDP  phyla  classification,  3.33%  of  the 
sequences  belonged  to  two  phyla  known  to  contain  dehalogenators  and  anaerobic 
bacteria.  The  phylum  Chloroflexi  contains  an  organism,  Dehaloccoides,  that  is  a  known 
dehalogenator,  and  the  phylum  Firmicutes  contains  anaerobic  organisms  with  low  G+C 
ratios  and  are  Gram-positive  (Fields  2004;  Bik  et  al.  2006).  Therefore,  the  phyla  contain 
organisms  that  can  transform  PCE  and  TCE  and  contribute  to  their  remediation  at  this 
site. 
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Research  Question  1:  Determine  the  effects  of  plant  presence  with  regards  to 
microbial  diversity  and  dominance 

Plant  presence  had  an  effect  on  microbial  community  composition  and  diversity. 
This  outcome  was  expected  based  on  the  literature,  but  this  research  provided  clear 
composition  charts  and  richness  and  evenness  parameters  to  support  this  hypothesis. 

In  order  to  address  community  composition,  the  sample  sequences  were  compared 
to  a  known  database,  RDP,  of  16S  rRNA  sequences  and  classified  into  phyla.  Results 
from  the  RDP  phyla  classification  showed  that  the  organisms  from  the  planted  and 
control  communities  were  classified  into  17  and  19  phyla  respectively.  This  included  an 
Unclassified  Bacteria  category,  which  was  reserved  for  any  sample  sequence  that  did  not 
match  a  known  sequence  in  the  RDP  database  80%  or  better.  The  control  community  had 
two  phyla  not  seen  in  the  planted  community:  TM7  and  Lentisphaerae.  TM7  is  a 
candidate  phylum  that  was  named  recently.  The  term  candidate  phylum  refers  to  phyla- 
level  clades  with  no  cultured  representatives,  typically  known  only  by  limited  numbers  of 
rRNA  sequences  (Harris  2004).  TM7  has  been  identified  through  its  DNA,  and  has  not 
yet  been  cultured,  but  a  recent  study  shows  that  the  phylum  is  widely  distributed  in  the 
environment  (Hugenholtz  et  al.  2001).  TM7  was  named  after  sequences  obtained  from  a 
peat  bog,  activated  sludge,  and  soil  (Hugenholtz  et  al.  1998).  The  phylum  Lentisphaerae 
is  typically  associated  with  marine  organisms  and  has  a  strong  relation  to  the  phylum 
Verrucomicrobia.  The  phylum  was  discovered  in  2004  in  samples  cultivated  from 
Oregon  coast  seawater,  and  the  species  within  the  phylum  are  strictly  aerobic  (Cho  2004). 

Since  the  sequences  produced  during  this  experiment  were  a  small  representative 
sample  of  the  total  microbial  population,  the  presence  of  one  individual  in  a  phylum  was 
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important  to  document.  The  microbial  community  was  extremely  diverse  here,  and  the 
individuals  present  in  the  sample  represent  the  dominant  organisms  in  the  total 
community.  All  other  phyla  were  present  in  approximately  the  same  percentages  for  the 
planted  and  control  communities;  therefore,  there  were  no  other  differences  between  the 
community  composition  of  the  planted  and  control  communities  to  note.  To  verify  that 
the  communities  were  similar  in  composition,  ANOSIM,  a  statistical  similarity  test,  was 
performed  on  the  RDP  phylum  classifications.  The  analysis  revealed  no  significant 
differences  between  the  planted  and  control  communities,  (n=5000  permutations; 
p=0.75).  The  microbial  community  composition  did  not  change  due  to  plant  presence. 

Community  diversity  was  calculated  using  DOTUR,  which  compared  sample 
sequences  to  one  another  and  placed  sequences  into  OTUs,  based  on  sequence  similarity. 
At  the  species  level,  97%  similarity,  the  evenness  was  high,  while  the  sampling  effort, 
according  to  Good’s  coverage,  was  low,  an  average  of  45%  for  both  the  planted  and 
control  communities.  This  indicated  that  the  species  level  was  vastly  undersampled.  The 
true  diversity  was  extremely  high,  which  was  the  trend  expected  from  literature  on 
species  microbial  diversity.  Species  richness  could  not  be  determined  because  this  level 
was  undersampled.  However,  this  data  does  support  the  accepted  theory  that  the  true 
microbial  diversity  in  soil  is  extremely  vast. 

Good’s  coverage  values  indicated  that  the  sampling  effort  for  the  phylum  level 
was  extremely  high,  -90%,  for  both  the  planted  and  control  communities.  That,  coupled 
with  the  fact  that  the  evenness  percentages  at  the  phylum  level  were  also  high,  illustrates 
that  the  phylum  level  diversity  could  be  captured  by  the  sample  sequences.  Richness 
parameters  were  significantly  higher  in  the  planted  community  compared  to  the  control 
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community.  Communities  associated  with  plant  life  are  significantly  more  diverse  than 
unplanted  communities. 

Research  Question  2:  Determine  the  effects  of  plant  species  with  regards  to 
microbial  diversity  and  community  composition 

The  results  for  this  research  showed  that  plant  species  produced  different 
microbial  composition  in  the  mesocosms,  but  they  were  not  significantly  different.  RDP 
phyla  classifications  illustrated  some  differences  in  the  microbial  communities  associated 
with  each  species  of  plant.  The  Firmicutes  population  made  up  3.4%  of  the  total 
community  in  the  Carex  comosa  mesocosms.  While  the  Firmicutes  population  only 
reached  0.46%  in  the  Scirpus  atrovirens  mesocosm  and  1.2%  in  the  Eleocharis 
erythropoda.  Another  difference  was  observed  in  the  Verrucomicrobia  population. 
Eleocharis  erythropoda  held  the  highest  percentage  with  4.4%,  and  the  other  two  species 
had  an  average  of  2.3%.  This  indicated  that  Carex  comosa  had  a  more  prevalent 
population  of  Firmicutes  in  the  microbial  community  associated  with  it.  The  last  item  to 
mention  was  that  all  three  species  of  plants  had  a  prevalent  population  of  Proteobacteria. 
However,  Scirpus  atrovirens  had  nearly  half  of  its  individuals  in  this  phylum  while  the 
other  two  communities  only  had  a  33.7%  makeup.  This  was  expected  since  this  phylum 
contains  typical  soil  organisms.  These  differences  illustrated  that  the  plants  can 
contribute  to  a  microbial  composition  that  was  more  prevalent  to  particular  phyla. 
Previous  studies  have  shown  that  different  plant  species  can  exude  nutrients  or  other 
inputs  that  can  affect  the  microbial  community  composition  (Stottmeister  2003). 

To  verify  that  the  community  compositions  were  different,  ANOSIM,  a 
statistical  similarity  test,  was  performed  on  the  RDP  phylum  classifications.  The  analysis 
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revealed  no  significant  differences  between  the  plant  species  communities,  (n=5000 
permutations;  p=0.21).  Therefore  even  with  the  noted  differences  above,  the  community 
compositions  were  not  significantly  affected  by  the  three  plant  species  used  in  this 
experiment. 

Diversity  analysis  was  performed  using  DOTUR.  The  richness  parameters 
showed  some  slight  differences.  At  the  species  level,  the  evenness  was  high  while  the 
sampling  effort,  according  to  Good’s  coverage,  was  low,  an  average  of  30%  for  all  three 
communities.  This  indicated  that  the  species  level  was  vastly  undersampled.  The  true 
diversity  was  extremely  high,  which  was  the  trend  expected  from  literature  on  microbial 
diversity. 

At  the  phylum  level  the  evenness  was  again  high  for  all  three  communities  and 
the  sampling  effort,  according  to  Good’s  coverage,  was  also  high,  an  average  of  92%.  At 
the  phylum  level,  the  rarefaction  data  for  Eleocharis  erythropoda  was  the  highest  for  all 
three  plant  species.  Scirpus  atrovirens  had  species  richness  slightly  below  Eleocharis 
erythropoda,  and  Carex  comosa  had  the  lowest  estimation.  However,  when  95% 
confidence  intervals  calculated  by  DOTUR  were  noted,  this  trend  was  not  statistically 
significant.  The  Eleocharis  and  Scirpus  communities  overlapped  error  bars  more  than 
50%  as  did  the  Scirpus  and  Carex  communities.  This  indicated  that  the  communities’ 
richness  were  not  statistically  different.  The  Carex  community  did  not  overlap  the 
Eleocharis  community’s  error  bars  on  the  phylum  level  rarefaction  curve,  and  therefore, 
the  two  communities’  richness  was  statistically  different.  Therefore,  the  Eleocharis 
erythropoda  had  a  more  diverse  community  than  Carex  comosa.  Also  Figure  18, 
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illustrated  that  with  increased  sampling  effort  the  plant  species  phylum  rarefaction  curves 
will  split  apart  and  become  significantly  different  for  phylum  richness. 

Plants  have  been  shown  to  increase  diversity  throughout  the  literature  as  well  as 
above  in  Objective  1.  However,  plant  species  affect  the  microbial  communities  in 
various  ways  depending  on  the  nutrients,  root  system,  and  other  properties.  The  plants 
used  in  this  research  all  came  from  the  same  family  and  have  the  same  growth  habit. 
Therefore,  it  was  expected  that  the  diversity  and  composition  between  the  plant  species 
would  not  differ.  However,  the  results  illustrate  that  the  diversity  for  the  Carex  and 
Eleocharis  communities  do  differ  significantly.  Therefore,  there  may  be  a  metabolic 
property  or  other  factor  that  one  of  the  species  had  that  affects  the  microbial  community 
associated  with  it. 

Research  Question  3:  Determine  the  effects  of  soil  depth  with  regards  to  microbial 
diversity  and  community  composition 

There  was  evidence  that  microbial  communities  varied  in  composition  due  to 
depth.  The  depth  communities  represented  the  relationships  established  from  a 
subsurface  flow  hydrology.  RDP  phyla  classifications  illustrated  some  differences  in  the 
microbial  communities  associated  with  depth.  One  phylum  did  stand  out  between  the 
three  depths.  Chloroflexi  was  present  at  4.4%  in  the  bottom  depth.  The  top  and  middle 
depths  had  only  a  2.5%  population.  This  could  indicate  that  the  bottom  depths  are  more 
likely  to  promote  an  environment  in  which  the  phylum  Chloroflexi  can  become  prevalent 
Chloroflexi  is  a  phylum  that  is  known  to  contain  dehalogenators.  Dehalogenators  are 
organisms  that  can  bioremediate  contaminants  such  as  PCE  and  TCE,  which  are  the 
contaminants  treated  by  the  WPAFB  constructed  wetland.  To  verify  that  the  community 
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compositions  were  different,  ANOSIM,  a  statistical  similarity  test,  was  performed  on  the 
RDP  phylum  classifications.  The  analysis  revealed  no  significant  differences  between 
the  depth  communities,  (n=5000  permutations;  p=0.31).  Therefore  even  with  the  noted 
differences  above,  the  community  compositions  are  not  significantly  affected  by  depth  in 
this  study. 

The  diversity  analysis  was  calculated  using  DOTUR.  At  the  species  level,  the 
evenness  was  high  while  the  sampling  effort,  according  to  Good’s  coverage,  was  low,  an 
average  of  35%.  This  indicated  that  the  species  level  was  vastly  undersampled.  The  true 
diversity  was  extremely  high,  which  was  the  trend  expected  from  literature  on  microbial 
diversity. 

Good’s  coverage  values  indicated  that  the  sampling  effort  for  the  phylum  level 
was  extremely  high,  ~90%,  for  all  the  depth  communities.  The  evenness  at  the  phylum 
level  was  high  indicating  that  the  distribution  of  OTUs  was  even.  Richness  analysis  did 
show  that  depth  had  an  impact  on  richness  at  the  phylum  level.  The  Carex  community 
and  the  control  community  richness  were  significantly  lower  in  the  middle  layer  than  the 
other  two  depths.  This  indicates  that  these  communities  have  a  lower  richness  in  the 
middle  depth.  However,  all  three  species  of  plant  communities  and  the  control  do  show 
that  with  increased  sampling  depth  richness  does  continue  to  vary  and  split  apart  from 
one  another.  The  middle  layer  was  consistently  the  lowest  richness.  This  may  be  due  to 
the  fact  that  the  middle  layer  was  lacking  or  promoting  nutrients,  or  other  properties,  that 
decrease  diversity. 

It  is  also  interesting  to  note  that  the  Carex  comosa  phyla  rarefaction  curve  reached 
an  asymptote  for  the  middle  layer  lower  than  any  of  the  other  plant  species  or  control 
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communities,  indicating  that  the  Carex  comosa  community  was  associated  with  a  lower 
diversity.  As  discussed  in  Chapter  II,  plant  species  can  exude  nutrients  or  have  metabolic 
functions  that  are  unique.  These  properties  allow  for  a  unique  microbial  community  to 
form  when  associated  with  a  particular  plant  species.  Although  Carex  comosa  is  related 
to  the  other  two  plant  species  used  in  this  study,  the  results  presented  here  illustrate  that  it 
still  has  unique  properties  affecting  the  microbial  community. 

Limitations  of  research 

This  research  was  an  attempt  to  characterize  the  soil  microbial  communities 
associated  with  plant  presence,  controls,  and  different  plant  species.  Considering  that  a 
single  gram  of  soil  can  potentially  have  106  microorganisms,  a  sample  size  of  3,099  may 
be  too  small.  However,  reasonable  interpretations  can  be  made  from  the  results  of  the 
sample.  Another  limitation  involved  the  PCR  amplification.  In  this  project  PCR 
amplified  the  16S  rRNA  gene  segment.  This  was  in  turn  cloned.  However,  there  is  no 
guarantee  that  the  clone  generated  from  the  PCR  product  was  an  original  amplification  or 
just  another  copy.  Therefore,  it  should  be  mentioned  that  this  analysis  captures  the 
dominant  organisms  within  populations.  Results  should  be  interpreted  within  this 
context. 

Also  it  is  important  to  mention  that  the  three  species  of  plant  chosen  for  this 
experiment  share  common  ancestry  and  have  the  same  herb  growth  habit.  This  means 
that  the  plants  are  not  very  different  in  how  they  operate,  and  therefore  they  would  likely 
impact  the  microbial  communities  in  a  similar  fashion.  If  diversity  was  the  goal,  it  might 
have  been  more  advantageous  to  use  plants  with  different  growth  habits. 
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Significance  of  Research 

This  research  was  unique  for  several  reasons.  First,  this  analysis  has  never  before 
been  used  with  a  mesocosm  experiment.  Studies  using  microcosms  or  field  samples  are 
common.  Secondly  2,820  sequences  were  used  for  analysis.  Previous  research  usually 
concentrated  on  ~100  to  -700  sequences.  This  research  has  increased  the  sample  size 
four  times.  This  allowed  more  complex  results  and  interpretations.  Lastly,  this  research 
is  significant  because  it  merged  two  detailed  analyses  together.  The  sequences  were 
specifically  classified  into  named  phyla  by  the  RDP  program  and  then  the  sequences 
were  grouped,  based  on  evolutionary  distances,  using  Phylip  and  DOTUR.  This  provided 
an  in-depth  analysis  of  the  large  amount  of  sequences  generated  by  this  project.  The 
results  provide  invaluable  insight  into  plant  effect  on  microbial  communities  and  depth 
effects.  Most  importantly,  this  research  enhances  the  understanding  of  microbial 
consortia  needed  for  bioremediation. 

Further  Research 

This  research  simply  hints  at  the  true  diversity  of  the  microbial  world.  Therefore, 
it  is  recommended  that  further  research  is  done  to  increase  the  sample  size  upwards  to 
8000  sequences.  This  sample  size  would  be  expected  to  approach  the  asymptote  values 
seen  in  all  the  richness  estimations  in  this  research.  Therefore  the  true  diversity  can  be 
seen. 

Also,  since  this  research  serves  as  a  pre  contamination  baseline  for  comparison  to 
PCE  contaminated  mesocosms,  research  should  continue.  This  experiment  should  be 
repeated  with  samples  from  the  now-contaminated  mesocosms  used  for  this  experiment. 
This  will  allow  researchers  to  determine  the  true  effect  PCE  contamination  has  on 
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microbial  community  composition  and  diversity.  PCE  contamination  would  be 
expected  to  affect  the  diversity  and  composition  of  the  microbes  in  the  mesocosms. 
Studies  have  shown  that  microbial  communities  change  to  handle  specific  contaminants. 
Therefore,  it  is  hypothesized  that  the  post  contaminated  samples  will  show  less  diversity 
and  a  stronger  prevalence  for  phyla  containing  known  dehalogenators  and  anaerobic 
organisms. 

This  research  not  only  provides  the  baseline  for  comparison  to  contaminated 
sample,  but  it  also  provides  a  baseline  to  investigate  the  trends  identified.  This  research 
showed  that  Chloroflexi  had  more  prevalent  in  the  bottom  layers  of  all  the  mesocosm. 
This  could  indicate  that  the  bottom  layer  had  an  environment  more  prone  for  organisms 
with  this  phylum.  The  first  stages  of  remediation  in  a  subsurface  flow  wetland  occur  in 
the  bottom  layers,  and  that  was  where  the  dehalogenators  were  expected.  The  Carex 
comosa  community  had  a  significantly  lower  richness  at  the  middle  level.  This  combined 
with  other  research  illustrates  that  Carex  has  properties  that  diminish  richness.  An 
experiment  should  be  organized  to  investigate  this  trend  in  Carex.  And  finally  this 
baseline  provided  the  composition  makeup  in  the  mesocosms.  Now  further  research  can 
investigate  phyla  and  functional  groups  identified  by  this  research  using  PCR  specifically 
designed  for  identifying  particular  groups. 

Summary 

This  research  has  shown  some  interesting  trends  in  microbial  communities  that 
are  most  likely  happening  in  the  constructed  wetland.  The  mesocosms  were  designed 
with  the  same  soil  properties,  hydrologic  flow,  and  plant  presence.  Therefore,  the  trends 
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seen  in  the  mesocosms  are  most  likely  also  being  experienced  in  the  wetland  at  WPAFB. 
Microorganisms  are  an  invaluable  natural  remediation  system.  Research  such  as  this, 
provides  the  background  understanding  to  help  natural  remediation  become  a  more 
controlled  and  advantageous  process. 


92 


Appendix  A:  PCR  Protocol  Using  HotStarTaq  Master  Mix  (Qiagen  2002). 

This  protocol  serves  only  as  a  guideline  for  PCR  amplification.  Optimal  reaction 
conditions,  such  as  incubation  times  and  temperatures,  and  amount  of  template  DNA, 
may  vary  and  need  to  be  determined  individually. 

Notes: 

Each  PCR  program  should  be  started  with  an  initial  activation  step  of  15  min 
at  95°C  to  activate  HotStarTaq  DNA  Polymerase  (see  step  6  of  this  protocol). 

HotStarTaq  Master  Mix  provides  a  final  concentration  of  1 .5  mM  MgC^  in  the 
final  reaction  mix,  which  will  produce  satisfactory  results  in  most  cases. 

However,  if  a  higher  Mg2+  concentration  is  required,  prepare  a  stock  solution 
containing  25  mM  MgCf. 

Set  up  reaction  mixtures  in  an  area  separate  from  that  used  for  DNA  preparation 
or  PCR  product  analysis. 

Use  disposable  tips  containing  hydrophobic  filters  to  minimize  cross¬ 
contamination. 

1.  Thaw  primer  solutions. 

Mix  well  before  use. 

Optional:  prepare  a  primer  mix  of  an  appropriate  concentration  (see  Table  4) 
using  the  water  provided.  This  is  recommended  if  several  amplification  reactions 
using  the  same  primer  pair  are  to  be  performed.  The  final  volume  of  diluted  primer 
mix  should  be  25  pi  per  reaction  including  the  template  DNA,  added  at  step  4. 

2.  Mix  the  HotStarTaq  Masters  Mix  by  vortexing  briefly  and  dispense  25  pi  into 
each  PCR  tube  according  to  Table  4. 

It  is  important  to  mix  the  HotStarTaq  Master  Mix  before  use  in  order  to  avoid 
localized  concentrations  of  salt.  HotStarTaq  Master  Mix  is  provided  as  a  2x 
concentrate  (i.e.,  a  25  pi  volume  of  the  HotStarTaq  Master  Mix  is  required  for 
amplification  reactions  with  a  final  volume  of  50pl).  For  volumes  smaller  than  50 
pi,  the  1:1  ratio  of  HotStarTaq  Master  Mix  to  diluted  primer  mix  and  template  should 
be  maintained  as  defined  in  Table  4.  A  negative  control  (without  template  DNA) 
should  always  be  included.  It  is  not  necessary  to  keep  PCR  tubes  on  ice  as 
nonspecific  DNA  synthesis  cannot  occur  at  room  temperature  due  to  the  inactive 
state  of  Hot  StarTaq  DNA  Polymerase. 

3.  Distribute  the  appropriate  volume  of  diluted  primer  mix  into  the  PCR  tubes 
containing  the  Master  Mix. 

4.  Add  template  DNA  (y<=l  pg/reaction)  to  the  individual  PCR  tubes. 

The  volume  added  should  not  exceed  10%  of  the  final  PCR  volume. 
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Table  4.  Reaction  composition  using  HotStarTaq  Master  Mix 


Component 

V  olume/reaction 

Final  concentration 

HotStarTaq  Master  Mix  25  pi 

Diluted  primer  mix 

25  pi 

2.5  units  HotStarTaq 
DNA  Polymerase 

1  x  PCR  Buffer* 

200  pM  of  each  dNTP 

Primer  A  0.1 -0.5  pM 

Variable 

0.1-.05  pM 

Primer  B 

Variable 

0.1 -.05  pM 

Distilled  water  (provided) 

Variable 

- 

Template  DNA 

Template  DNA,  added  at  step  4 

Variable 

<1  pg/reaction 

Total  Volume 

50  pi 

- 

*Contains  1 .5  mM  MgCl2 

5.  When  using  thermal  cyclers  with  a  heated  lid,  do  not  use  material  oil.  Proceed 
directly  to  step  6.  Otherwise,  overlay  with  approximately  50  pi  mineral  oil. 


6.  Program  the  thermal  cycler  according  to  the  manufacturer’s  instructions. 

Each  PCR  program  must  start  with  an  initial  heat  activation  step  at  95°C  for  15  min. 

A  typical  PCR  cycling  program  is  outlined  below.  For  maximum  yield  and 
specificity,  temperatures  and  cycling  times  should  be  optimized  for  each  new  template 
target  and  primer  pair. 


Initial  activitation  step: 

15  min 

95°C 

Additional  Comments 

HotStarTaq  DNA  Polymerase  is 

3-step  cycling 

Denaturation: 

0.5-1  min 

94°C 

activated  by  this  heating  step 

Annealing: 

0.5-1  min 

50°C-68°C 

5°C  below  Tm  of  primers 

Extension: 

1  min 

72°C 

For  PCR  products  longer  than  lkb, 

Number  of  Cycles: 

Final  Extension: 

20-35 

10  min 

72°C 

use  an  extension  time  of  approximately 

1  min  per  kb  DNA 

7.  Place  the  PCR  tubes  in  the  thermal  cycler  and  start  cycling  program. 

Note:  After  amplification,  samples  can  be  stored  overnight  at  2-8°C  or  at  -20°C  for 
longer  storage. 
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Appendix  B:  Cloning  Month  Legend 


A-August 

S-September 

O-October 

N-November 

D-December 

J-January 

F-February 

M-March 

Ap- April 

My-May 

Ju-June 

Jy-July 


95 


Appendix  C:  Mo  Bio  PowerSoil™  DNA  Isolation  Kit  Extraction  Protocol.  (Mo  Bio 

Laboratories,  Carlsbad,  CA  2004) 


Introduction 

The  PowerSoil™  DNA  Isolation  Kit  is  comprised  of  a  novel  and  proprietary  method  for 
isolating  genomic  DNA  from  environmental  samples.  The  kit  is  intended  for  use  with 
environmental  samples  containing  a  high  humic  acid  content  including  difficult  soil  types 
such  as  compost,  sediment,  and  manure.  Other  more  common  soil  types  have  also  been 
used  successfully  with  this  kit.  The  isolated  DNA  has  a  high  level  of  purity  allowing  for 
more  successful  PCR  amplification  of  organisms  from  the  sample.  PCR  analysis  has 
been  performed  to  detect  a  variety  of  organisms  including  bacteria  (e.g.  Bacillus  subtilis, 
Bacillus  anthracis ),  fungi  (e.g.  yeasts  ,  molds),  algae  and  Actinomycetes  (e.g. 
Streptomyces). 

The  PowerSoil™  DNA  Isolation  Kit  distinguishes  itself  from  Mo  Bio’s  Ultraclean™  Soil 
DNA  Isolation  kit  with  a  NEW  humic  substance/brown  color  removal  procedure.  This 
new  procedure  is  effective  at  removing  PCR  inhibitors  from  even  the  most  difficult  soil 
types. 

Environmental  samples  are  added  to  a  bead  beating  tube  for  rapid  and  thorough 
homogenization.  Cell  lysis  occurs  by  mechanical  and  chemical  methods.  Total  genomic 
DNA  is  captured  on  a  silica  membrane  in  a  spin  column  format.  DNA  is  then  washed 
and  eluted  from  the  membrane.  DNA  is  then  ready  for  PCR  analysis  and  other 
downstream  applications. 

WARNING:  Solution  C5  contains  ethanol.  It  is  flammable. 

IMPORTANT  NOTE  FOR  USE:  Make  sure  the  2  ml  PowerBead  Tubes  rotate 
freely  in  your  centrifuge  without  rubbing. 

Kit  Storage 

Kit  reagents  and  components  should  be  stored  at  room  temperature. 
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Kit  Contents 


Quantity 

Component 

12888- 

50 

12888- 

100 

PowerBead  Tubes  (contains  750  ul 
solution 

50 

100 

Solution  Cl 

3.3  ml 

6.6  ml 

Solution  C2 

14  ml 

28  ml 

Solution  C3 

11  ml 

22  ml 

Solution  C4 

72  ml 

144  ml 

Solution  C5 

27.5  ml 

55  ml 

Solution  C6 

6  ml 

12  ml 

Spin  Filters  Units  in  2  ml  Tubes 

50 

100 

Collection  Tubes  (2  ml) 

200 

400 

1.  To  the  2  ml  PowerBead  Tubes  provided,  add  0.25  gm  of  soil  sample. 

2.  Gently  vortex  to  mix. 

3.  Check  solution  Cl.  If  Solution  Cl  is  precipitated,  heat  solution  to  60°C  until 
dissolved  before  use. 

4.  Add  60  pi  of  Solution  Cl  and  invert  several  times  or  vortex  briefly. 

5.  Secure  PowerBead  Tubes  horizontally  using  the  Mo  Bio  Vortex  Adapter  tube  holder 
for  the  vortex  (Mo  Bio  Catalog  No.  13000-V1.  Call  1-800-606-6246  for  information)  or 
secure  tubes  horizontally  on  a  flat-bed  vortex  pad  with  tape.  Vortex  at  maximum  speed 
for  10  minutes. 

6.  Make  sure  the  PowerBead  Tubes  rotate  freely  in  your  centrifuge  without  rubbing. 
Centrifuge  tubes  at  10,000  x  g  for  30  seconds.  CAUTION:  Be  sure  not  to  exceed 
10,000  x  g  or  tubes  may  break. 

7.  Transfer  the  supernatant  to  a  clean  microcentrifuge  tube  (provided). 

Note:  Expect  between  400  to  500  pi  of  supernatant.  Supernatant  may  still  contain 
some  soil  particles. 

8.  Add  250  pi  of  Solution  C2  and  vortex  for  5  seconds.  Incubate  at  4°C  for  5  minutes. 

9.  Centrifuge  the  tubes  for  1  minute  at  10,000  xg. 

10.  Avoiding  the  pellet,  transfer  up  to,  but  no  more  than,  600pl  of  supernatant  to  a  clean 
microcentrifuge  tube  (provided). 

11.  Add  200pl  of  Solution  C3  and  vortex  briefly.  Incubate  at  4°C  for  5  minutes. 

12.  Centrifuge  the  tubes  for  1  minute  at  10,000  xg. 

13.  Avoiding  the  pellet,  transfer  up  to,  but  no  more  than,  750pl  of  supernatant  to  a  clean 
microcentrifuge  tube  (provided). 

14.  Add  1200  pi  of  Solution  C4  to  the  supernatant  and  vortex  for  5  seconds. 

15.  Load  approximately  675  pi  onto  a  spin  filter  and  centrifuge  at  10,000  x  g  for  1 
minute.  Discard  the  flow  through  and  add  an  additional  675  pi  of  supernatant  to  the  spin 
fdter  and  centrifuge  at  10,000  x  g  for  1  minute.  Load  the  remaining  supernatant  onto  the 
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spin  filter  and  centrifuge  at  10,000  x  g  for  1  minute.  Note:  A  total  of  three  loads  for  each 
sample  processed  are  required. 

16.  Add  500  pi  of  Solution  C5  and  centrifuge  for  30  seconds  at  10,000  x  g. 

17.  Discard  flow  through. 

18.  Centrifuge  again  for  1  minute. 

19.  Carefully  place  spin  filter  in  a  new  clean  tube  (provided).  Avoid  splashing  any 
Solution  C5  onto  the  spin  filter. 

20.  Add  lOOpl  of  Solution  C6  to  the  center  of  the  white  filter  membrane.  Alternatively, 
sterile  DNA-Free  PCR  Grade  Water  may  be  used  for  elution  from  the  silica  spin  filter 
membrane  at  this  step  (Mo  Bio  Catalog  No.  17000-10). 

21.  Centrifuge  for  30  seconds. 

22.  Discard  the  spin  filter.  DNA  in  the  tube  is  now  application  ready.  No  further  steps 
are  required. 

We  recommend  storing  DNA  frozen  (-20°C  to  -80°C).  Solution  C6  contains  no  EDTA. 

Wet  Soil  Sample 

If  soil  sample  is  high  in  water  content,  remove  contents  from  PowerBead  Tube  (beads 
and  solution)  and  transfer  into  another  sterile  microcentrifuge  tube  (not  provided).  Add 
soil  sample  to  PowerBead  Tube  and  centrifuge  for  30  seconds  at  10,000  x  g.  Remove  as 
much  liquid  as  possible  with  a  pipet  tip.  Add  beads  and  bead  solution  back  to 
PowerBead  Tube  and  follow  protocol  starting  at  step  2. 

If  DNA  Does  Not  Amplify 

Make  sure  to  check  DNA  yields  by  gel  electrophoresis  or  spectrophotometer 
reading.  An  excess  amount  of  DNA  will  inhibit  PCR  reaction. 

Diluting  the  template  DNA  should  not  be  necessary  with  DNA  isolated  with  the 
PowerSoil  DNA  Isolation  Kit;  however,  it  should  still  be  attempted. 

If  DNA  will  still  not  amplify  after  trying  the  steps  above,  then  PCR  optimization 
(changing  reaction  conditions  and  primer  choice)  may  be  needed. 

Eluted  DNA  Sample  Is  Brown 

We  have  not  observed  any  coloration  in  DNAs  isolated  using  the  PowerSoil  DNA 
Isolation  Kit.  If  you  observe  coloration  in  your  samples,  please  contact  technical  support 
for  suggestions. 

Alternative  Lysis  Method 

After  adding  Solution  Cl,  vortex  3-4  seconds,  then  heat  to  70°C  for  5  minutes.  Vortex  3- 
4  seconds.  Heat  another  5  minutes.  Vortex  3-4  seconds.  This  alternative  procedure  will 
reduce  shearing  but  may  also  reduce  yield. 

Concentrating  the  DNA 

Your  final  volume  will  be  lOOpl.  If  this  is  too  dilute  for  your  purposes,  add  4  pi  of  5M 
NaCl  and  mix.  Add  200  pi  of  100%  cold  ethanol  and  mix.  Centrifuge  at  10,000  x  g  for  5 
minutes.  Decant  all  liquid.  Dry  residual  ethanol  in  a  speed  vac,  dessicator,  or  air  dry. 
Resuspend  precipitated  DNA  in  desired  volume. 
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DNA  Floats  Out  of  Well  When  Loaded  on  a  Gel 

You  may  have  inadvertently  transferred  some  residual  Solution  C5  into  the  final  sample. 
Prevent  this  by  being  careful  in  step  19  not  to  transfer  liquid  onto  the  bottom  of  the  spin 
filter  basket.  Ethanol  precipitation  is  the  best  way  to  remove  Solution  C5  residue.  (See 
“Concentrating  the  DNA”  above) 

Storing  DNA 

DNA  is  eluted  in  Solution  C6  (lOmM  Tris)  and  must  be  stored  at  -20°C  to  80°C  or  it  may 
degrade  over  time.  DNA  can  be  eluted  in  TE  but  the  EDTA  may  inhibit  reactions  such  as 
PCR  and  automated  sequencing.  DNA  may  be  eluted  with  sterile  DNA-Free  PCR  Grade 
Water  (Mo  Bio  Catalog  No.  17000-10). 

Cells  are  Difficult  to  Lyse 

If  cells  are  difficult  to  lyse,  a  10  minute  incubation  at  70°C,  after  adding  Solution  Cl,  can 
be  performed.  Follow  by  continuing  with  protocol  step  5. 

Technical  Information 

Product  Manuafactured  by  Mo  Bio  Laboratories,  Inc.  2746  Loker  Avenue  West, 

Carlsbad,  CA  92008. 
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Appendix  D:  StrataClone™  PCR  Cloning  Kit 


MATERIALS  PROVIDED 


Materials  Provided 

Quantity3 

Catalog  #240205 

Catalog  #  240206 

StrataClone™Vector  Mix 

21  reactions  (pi 
each) 

StrataClone™Cloning  Buffer 

63  pi 

63  pi 

StrataClone™Control  Insert  (5  ng/pl) 

50  ng 

50  ng 

StrataClone™SoloPack®Competent  Cells 

21  transformations 
(50  pi  each) 

11 

transformations 
(50  |il  each) 

pUC18  Control  Plasmid  (0.1  ng/pl  in  TE  Buffer) 

10  pi 

10  pi 

a  Catalog  #240205  provides  enough  reagents  for  20  experimental  cloning  reactions  plus 
one  Control  Insert  cloning  reaction.  Catalog  #240206  provides  enough  reagents  for  10 
experimental  cloning  reactions  plus  one  Control  Insert  cloning  reaction. 


STORAGE  CONDITIONS 

StrataClone™  SoloPack®  Competent  Cells  and  pUC18  Control  Plasmid:  -80°C 
All  Other  Components:  -20°C 

Note  The  StrataClone  SoloPack  competent  cells  are  sensitive  to  variations  in 

temperature  and  must  be  stored  at  the  bottom  of  a  -80°C  freezer.  Transferring 
tubes  from  one  freezer  to  another  may  result  in  a  loss  of  efficiency. 

ADDITIONAL  MATERIALS  REQUIRED 

Taq  DNA  polymerase  or  a  polymerase  blend  recommended  for  PCR  cloning 
Thermocycler 
LB-ampicillin  agar  plates 
LB  medium 

5-Bromo-4-chloro-3-indoyl-P-D-galactopyranoside  (X-gal) 

INTRODUCTION 

The  StrataClone™  PCR  Cloning  Kit§  allows  high-efficiency,  5 -minute  cloning  of  PCR 
products,  using  the  efficient  DNA  rejoining  activity  of  DNA  topoisomerase  I  and  the 
DNA  recombination  activity  of  Cre  recombinase. 

Overview  of  StrataClone™  PCR  Cloning  Technology 

StrataClone  PCR  cloning  technology  exploits  the  combined  activities  of  topoisomerase  I 
from  Vaccinia  virus  and  Cre  recombinase  from  bacteriophage  PI.  In  vivo,  DNA 
topoisomerase  I  assists  in  DNA  replication  by  relaxing  and  rejoining  DNA  strands. 
Topoisomerase  I  cleaves  the  phosphodiester  backbone  of  a  DNA  strand  after  the 
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sequence  5-CCCTT,  forming  a  covalent  DNA-enzyme  intermediate  which  conserves 
bond  energy  to  be  used  for  religating  the  cleaved  DNA  back  to  the  original  strand.  Once 
the  covalent  DNA-enzyme  intermediate  is  formed,  the  religation  reaction  can  also  occur 
with  a  heterologous  DNA  acceptor.  1  The  Cre  recombinase  enzyme  catalyzes 
recombination  between  two  loxP  recognition  sequences. 

The  StrataClone  PCR  cloning  vector  mix  contains  two  DNA  arms,  each  charged  with 
topoisomerase  I  on  one  end  and  containing  a  loxP  recognition  sequence  on  the  other  end. 
The  topoisomerase-charged  ends  have  a  modified  uridine  (U*)  overhang.  Taq-amplified 
PCR  products,  which  contain  3 '-adenosine  overhangs,  are  efficiently  ligated  to  these 
vector  arms  in  a  5-minute  ligation  reaction,  through  A-U*  base-pairing  followed  by 
topoisomerase  I-mediated  strand  ligation. 

The  resulting  linear  molecule  (vector  armon-PCR  product-vector  armamp)  is  then 
transformed,  with  no  clean-up  steps  required,  into  a  competent  cell  line  engineered  to 
transiently  express  Cre  recombinase.  Cre-mediated  recombination  between  the  vector 
loxP  sites  creates  a  circular  DNA  molecule  (pSC-A-amp/kan,  see  Figure  2)  that  is 
proficient  for  replication  in  cells  growing  on  media  containing  ampicillin.  The  resulting 
pSC-A  product  includes  a  lacZ'  a-complementation  cassette  for  blue-white  screening. 

StrataClone™  SoloPack®  Competent  Cells 

The  provided  StrataClone  SoloPack  competent  cells  express  Cre  recombinase,  in  order  to 
circularize  the  linear  DNA  molecules  produced  by  topoisomerase  I-mediated  ligation. 
The  cells  are  provided  in  a  convenient  single-tube  transformation  format.  This  host  strain 
(containing  the  lacZAM15  mutation)  supports  blue- white  screening  with  plasmid  pSC-A, 
containing  the  lacZ'  a-complementation  cassette  (see  Figure  2).  It  is  not  necessary  to 
induce  lacZ'  expression  with  IPTG  when  performing  blue- white  screening  with  this 
strain. 

The  StrataClone  SoloPack  competent  cells  are  optimized  for  high  efficiency 
transformation  and  recovery  of  high-quality  recombinant  DNA.  The  cells  are 
endonuclease  ( endk ),  and  recombination  (reck)  deficient,  and  are  restriction-minus.  The 
cells  lack  the  tonA  receptor,  conferring  resistance  to  Tl,  T5,  and  cp80  bacteriophage 
infection,  and  lack  the  F'  episome.  StrataClone  SoloPack  competent  cells  are  resistant  to 
streptomycin. 

PCR  CLONING  PROTOCOL 
Preparing  the  PCR  Product 

1.  Prepare  insert  DNA  by  PCR  using  Taq  DNA  polymerase  or  an  enzyme  blend  qualified 
for  PCR  cloning  applications. 
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Note  Taq  DNA  polymerase  is  required  for  the  addition  of  3  '-adenine  residues  to  the 
PCR  product.  If  PCR  was  performed  using  a  proofreading  DNA  polymerase,  see 
Appendix  II  for  a  protocol  for  adding  3'-A  overhangs  after  the  PCR  reaction  is 
complete. 

If  the  PCR  template  is  a  plasmid  encoding  the  ampicillin  resistance  gene,  the 
plasmid  DNA  must  be  eliminated  prior  to  the  cloning  reaction  by  Dpn  I  digestion 
or  by  gel  purification  of  the  PCR  product. 

2.  Analyze  an  aliquot  of  the  PCR  reaction  on  an  agarose  gel  to  verify  production  of  the 
expected  fragment. 

3.  If  the  fragment  to  be  cloned  is  <3  kb  and  gel  analysis  confirms  robust,  specific 
amplification,  prepare  a  1 : 10  dilution  of  the  PCR  reaction  in  dF^O.  For  larger  or  poorly 
amplified  fragments,  omit  the  dilution  step. 

Note  If  multiple  PCR  products  are  observed  on  the  gel,  or  when  cloning  very 
large  PCR  products,  gel  isolate  the  desired  PCR  product  prior  to  performing  the 
ligation  reaction.  See  Appendix  I  for  a  gel-isolation  protocol.  For  a  gel-isolated 
PCR  product  recovered  in  50  pi,  add  2  pi  (undiluted)  of  the  purified  PCR  product 
to  the  ligation  reaction  below. 

Ligating  the  Insert 

4.  Prepare  the  ligation  reaction  mixture  by  combining  (in  order)  the 
following  components: 

3  pi  StrataClone™  Cloning  Buffer 

2  pi  of  PCR  product  (5-50  ng,  typically  a  1:10  dilution  of  a  robust  PCR  reaction) 
or  2  pi  of  StrataClone™  Control  Insert 
2  pi  StrataClone™  Vector  Mix 

5.  Mix  gently  by  repeated  pipetting,  and  then  incubate  the  ligation  reaction  at  room 
temperature  for  5  minutes.  When  the  incubation  is  complete,  place  the  reaction  on  ice. 

Note  The  cloning  reaction  may  be  stored  at  -20°C  for  later  processing. 


Transforming  the  Competent  Cells 

6.  Thaw  one  tube  of  StrataClone  SoloPack  competent  cells  on  ice  for  each  ligation 
reaction. 

Note  It  is  critical  to  use  the  provided  StrataClone  SoloPack  competent  cells, 
expressing  Cre  recombinase,  for  this  protocol.  Do  not  substitute  with  another 
strain. 
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7.  Add  1  (0,1  of  the  cloning  reaction  mixture  to  the  tube  of  thawed  competent  cells.  Mix 
gently  (do  not  mix  by  repeated  pipetting). 

Notes  For  large  PCR  products,  up  to  2  pi  of  the  cloning  reaction  mixture  may  be 
added  to  the  transformation  reaction. 

If  desired,  test  transformation  efficiency  of  the  competent  cells  by  transforming  a 
separate  tube  of  competent  cells  with  10  pg  of  pUC18  control  DNA.  Prior  to  use, 
dilute  the  pUCl  8  DNA  provided  1 : 10  in  dH20,  and  then  add  1  pi  of  the  dilution 
to  the  tube  of  competent  cells. 

8.  Incubate  the  transformation  mixture  on  ice  for  20  minutes.  During  the  incubation 
period,  pre-warm  SOC  medium  to  42°C. 

9.  Heat-shock  the  transformation  mixture  at  42°C  for  45  seconds. 

10.  Incubate  the  transformation  mixture  on  ice  for  2  minutes. 

11.  Add  250  pi  of  pre-warmed  SOC  medium  to  the  transformation  reaction  mixture. 
Allow  the  competent  cells  to  recover  for  at  least  1  hour  at  37°C  with  agitation.  (Lay  the 
tube  of  cells  on  the  shaker  horizontally  for  better  aeration.) 

12.  During  the  outgrowth  period,  prepare  LB-ampicillin  plates  for  blue- white  color 
screening  by  spreading  40  pi  of  2%  X-gal  on  each  plate. 

13.  Plate  5  pi  and  100  pi  of  the  transformation  mixture  on  the  LB-ampicillin-X-gal 
plates.  Incubate  the  plates  overnight  at  37°C. 

Notes  For  the  Control  Insert  cloning  reaction,  plate  10  pi  of  the  transformation 
mixture. 

For  the  pUC18  control  transformation,  plate  30  pi  of  the  transformation  mixture. 

When  spreading  <50  pi  of  transformation  mixture,  pipette  the  cells  into  a  50-pl 
pool  of  SOC  medium  before  spreading. 

14.  Pick  white  for  plasmid  DNA  analysis. 

Notes  Colonies  harboring  plasmids  containing  typical  PCR  product  inserts  are 
expected  to  be  white.  After  prolonged  incubation,  some  of  the  insert-containing 
colonies  may  appear  light  blue. 
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Appendix  E:  Plasmid  Prep  Protocol 


QIAprep  8  Turbo  Miniprep  Kit 
Catalog  no. 

(10) 

27152 

(50) 

27154 

Turbofilter®  8  Strips 

10 

50 

QIAprep  8  Strips 

10 

50 

Buffer  PI 

40  ml 

125  ml 

Buffer  P2 

40  ml 

125  ml 
2x  125 

Buffer  N3* 

60  ml 

ml 

Buffer  PB  * 

100  ml 

500  ml 

2x20 

2x  100 

Buffer  PE  (concentrate) 

ml 

ml 

2x55 

Buffer  EB 

55  ml 

ml 

Rnase  A 

400  pi ' 

125  pi  T 

Collection  Microtubes  (1.2  ml) 

13x8 

55x8 

Caps  for  QIAprep  Strips 

13x8 

55x8 

Caps  for  Collection  Microtubes 

13x8 

55x8 

Handbook 

1 

1 

*  Buffers  N3  and  PB  contain  Chaotrophic  salts  which  are  irritants  and 
not 

compatible  with  disinfecting  agents  containing  bleach.  Take  appropriate 
laboratory  safety  measures  and  wear  gloves  when  handling. 

4  Provided  as  a  10  mg/ml  solution 
T  Provided  as  a  100  mg/ml  solution 

Introduction 

The  QIAprep  Miniprep  system  provides  a  fast,  simple,  and  cost-effective  plasmid 
miniprep  method  for  routine  molecular  biology  laboratory  applications.  QIAprep 
Miniprep  Kits  use  silica  membrane  technology  to  eliminate  the  cumbersome  steps 
associated  with  loose  resisns  or  slurries.  Plasmid  DNA  purified  with  QIAprep  Miniprep 
Kits  is  immediately  ready  for  use.  Phenol  extraction  and  ethanol  precipitation  are  not 
required,  and  high-quality  plasmid  DNA  is  eluted  in  a  small  volume  of  Tris  buffer 
(included  in  each  kit)  or  water.  The  QIAprep  system  consists  of  four  products  with 
different  handling  options  to  suit  every  throughput  need. 

Low  throughput 

The  QIAprep  Spin  Miniprep  Kit  is  designed  for  quick  and  convenient  processing  of  1- 
24  samples  simultaneously  in  less  than  30  minutes.  QIAprep  spin  columns  can  be  used  in 
a  microcentrifuge  or  on  any  vacuum  manifold  with  luer  connectors  (e.g.,  QIAvac  24  Plus, 
or  QIAvac  6S  with  QIAvac  Luer  Adapters). 

Principle 

The  QIAprep  miniprep  procedure  is  based  on  alkaline  lysis  of  bacterial  cells  followed  by 
adsorption  of  DNA  onto  silica  in  the  presence  of  high  salt.  The  unique  silica  membrane 
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used  in  QIAprep  Miniprep  Kit  completely  replaces  glass  or  silica  slurries  for  plasmid 
minipreps. 

The  procedure  consists  of  three  basic  steps: 

Preparation  and  clearing  of  a  bacterial  lysate 
Adsorption  of  DNA  onto  the  QIAprep  membrane 
Washing  and  elution  of  plasmid  DNA 

Protocol:  Plasmid  DNA  Purification  Using  the  QIAprep  Spin  Miniprep  Kit  and  a 
Microcentrifuge 

This  protocol  is  designed  for  purification  of  up  to  20  pg  of  high-copy  plasmid  DNA  from 
1-5  ml  overnight  cultures  of  E.  coli  in  LB  (Luria-Bertani)  medium. 

1.  Resuspend  pelleted  bacterial  cells  in  250  pi  Buffer  PI  and  transfer  to  a 
microcentrifuge  tube. 

Ensure  that  RNase  A  has  been  added  to  Buffer  PI.  No  cell  clumps  should  be  visible 
after  resuspension  of  the  pellet. 

If  LyseBlue  reagent  has  been  added  to  Buffer  PI,  vigorously  shake  the  buffer  bottle  to 
ensure  LyseBlue  particles  are  completely  dissolved.  The  bacteria  should  be 
resuspended  completely  by  vortexing  or  pipetting  up  and  down  until  no  cell  clumps 
remain. 

2.  Add  250  pi  Buffer  P2  and  mix  thoroughly  by  inverting  the  tube  4-6  times. 

Mix  gently  by  inverting  the  tube.  Do  not  vortex,  as  this  will  result  in  shearing  of 
genomic  DNA.  If  necessary,  continue  inverting  the  tube  until  the  solution  becomes 
viscous  and  slightly  clear.  Do  not  allow  the  lysis  reaction  to  proceed  for  more  than  5 
min. 

If  LyseBlue  has  been  added  to  Buffer  PI  the  cell  suspension  will  turn  blue  after  addi¬ 
tion  of  Buffer  P2.  Mixing  should  result  in  a  homogeneously  colored  suspension.  If 
the  suspension  contains  localized  colorless  regions  or  if  brownish  cell  clumps  are  still 
visible,  continue  mixing  the  solution  until  a  homogeneously  colored  suspension  is 
achieved. 

3.  Add  350  pi  Buffer  N3  and  mix  immediately  and  thoroughly  by  inverting  the  tube 
4-6  times. 

To  avoid  localized  precipitation,  mix  the  solution  thoroughly,  immediately  after 
addition  of  Buffer  N3.  Large  culture  volumes  (e.g.,  >5  ml)  may  require  inverting  up 
to  10  times.  The  solution  should  become  cloudy. 

If  LyseBlue  reagent  has  been  used,  the  suspension  should  be  mixed  until  all  trace  of 
blue  has  gone  and  the  suspension  is  colorless.  A  homogeneous  colorless  suspension 
indication  that  the  SDS  has  been  effectively  precipitated. 

4.  Centrifuge  for  10  min  at  13,000  rpm  (~1 7,900  x  g)  in  a  table-top  microcentrifuge. 

A  compact  white  pellet  will  form. 

5.  Apply  supernatants  from  step  4  to  the  QIAprep  spin  columns  by  decanting  or 
pipetting. 

6.  Centrifuge  for  30-60  s.  Discard  the  flow-through. 

7.  Wash  the  QIAprep  spin  column  by  adding  0.5  ml  Buffer  PB  and  centrifuging  for 
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30-60  s.  Discard  the  flow-through. 

8.  Wash  QIAprep  spin  column  by  adding  0.75  ml  Buffer  PE  and  centrifuging  for 
30-60  s. 

9.  Discard  the  flow-through,  and  centrifuge  for  an  additional  1  min  to  remove 
residual  wash  buffer. 

Important:  Residual  wash  buffer  will  not  be  completely  removed  unless  the  flow¬ 
through  is  discarded  before  this  additional  centrifugation.  Residual  ethanol  from 
Buffer  PE  may  inhibit  subsequent  enzymatic  reactions. 

10.  Place  the  QIAprep  column  in  a  clean  1.5  ml  microcentrifuge  tube.  To  elute 
DNA,  add  50  pi  Buffer  EB  (lOmM  Tris-Cl,  pH  8.5)  or  water  to  the  center  of 

each  QIAprep  spin  column,  let  stand  for  1  min,  and  centrifuge  for  1  min. 
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Appendix  F :  Restriction  Digest  Protocol  (Promega,  Madison,  WI 2008) 


Introduction 

Restriction  enzymes,  also  referred  to  as  restriction  endonucleases,  are  enzymes  which 
recognize  short,  specific  (often  palindromic)  DNA  sequences.  They  cleave  double- 
stranded  DNA  (dsDNA)  at  specific  sites  within  or  adjacent  to  their  recognition 
sequences.  Most  restriction  enzymes  (REs)  will  not  cut  DNA  that  is  methylated  on  one 
or  both  strands  of  their  recognition  site,  although  some  require  substrate  methylation. 

Each  restriction  enzyme  has  specific  requirements  to  achieve  optimal  activity.  Ideal 
storage  and  assay  conditions  favor  the  most  activity  and  highest  fidelity  in  a  particular 
enzyme’s  function.  Conditions  such  as  temperatures,  pH,  enzyme  cofactor(s),  salt 
composition  and  ionic  strength  affect  enzyme  activity  and  stability.  Two  buffers  usually 
accompany  each  of  the  Promega’s  restriction  enzymes.  One  buffer  is  the  optimal 
reaction  buffer  which  may  be  from  the  4-CORE®  System  (Reaction  Buffers  A,  B,  C,  D) 
or  one  of  the  other  optimal  buffers  (Reaction  Buffers  E-L),  and  the  other  is  the  MULTI¬ 
CORE™  Buffer.  The  supplied  optimal  buffer  always  yields  100%  activity  for  the 
enzyme  it  accompanies,  and  serves  as  the  specific  reaction  buffer  for  individual  digests 
with  that  enzyme.  The  MULTI-CORE™  Buffer,  which  is  designed  for  broad 
compatibility  with  many  REs,  is  provided  with  enzymes  that  have  25%  or  greater  activity 
in  the  buffer.  The  MULTI-CORE™  Buffer  is  useful  for  multiple  digests  because  it 
generally  yields  more  activity  for  more  enzyme  combinations  than  any  of  the  other 
buffers,  but  sometimes  with  a  compromise  in  activity.  Multiple  digests  using  REs  with 
significantly  different  buffer  requirements  may  require  a  sequential  reaction  with  the 
addition  of  RE  buffer  or  salt  before  the  second  enzyme  is  used 

DNA  Substrate  Considerations 

DNA  substrates  commonly  used  for  restriction  enzyme  digestion  include  DNA  from 
bacteriophage  lambda,  bacterial  plasmid  DNA  and  genomic  DNA.  Lambda  DNA  is  a 
linear  DNA  form  that  is  an  industry  standard  for  measuring  and  expressing  unit  activity 
for  many  restriction  enzymes.  Compared  to  linear  DNA,  intact  supercoiled  plasmid  DNA 
(and  DNAs  with  a  large  number  of  the  target  restriction  site)  required  more  units  of 
enzyme  (two-  to  tenfold)  per  microgram  than  the  DNA  used  in  the  enzyme’s  activity 
assay. 

PCR  products  and  oligonucleotides  are  relatively  small  compared  with  DNA  used  for 
defining  RE  units.  Therefore,  when  using  these  substrates  in  a  restriction  digest,  it  is 
essential  to  take  into  consideration  the  molar  concentration  of  enzyme  recognition  sites 
and  not  just  the  mass  DNA.  Also,  some  REs  require  flanking  bases  surrounding  the  core 
RE  restriction  site.  This  is  problematic  when  it  is  necessary  to  cut  an  oligonucleotide  or  a 
fragment  of  DNA  with  an  RE  site  near  its  end.  When  PCR  cloning  strategies  include  the 
use  of  primers  containing  an  RE  site,  care  is  necessary  in  designing  the  primer  with 
adequate  DNA  surrounding  the  core  RE  recognition  sequence. 
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In  addition  to  the  form  and  original  source  of  the  DNA,  the  purity  is  another  factor  that 
must  be  considered.  Depending  on  the  purification  method  and  the  handling  of  the  DNA, 
it  may  contain  varying  amounts  of  contaminants  that  affect  restriction  enzyme  digestion 
and  analysis.  Contaminants  may  include  other  types  of  DNA,  nucleases,  salts  and 
inhibitors  or  restriction  enzymes.  The  effect  of  a  contaminant  on  an  RE  digest  is 
generally  dose-dependent:  i.e.,  the  inhibitory  effects  will  increase  with  the  volume  of 
DNA  added  to  the  restriction  enzyme  reaction.  Relatively  pure  DNA  is  required  for 
efficient  restriction  enzyme  digestion.  Contaminating  nucleases  are  usually  activated 
only  after  the  addition  of  salts  (e.g.,  restriction  enzyme  buffer)  to  the  DNA  solution. 
Therefore,  appropriate  control  reactions  should  always  be  run  in  parallel  with  the 
restriction  digest.  Buffer  solutions  containing  EDTA  in  low  concentrations  (ImM)  are 
often  used  to  protect  DNA  from  nuclease  degradation  during  storage,  but  the  EDTA  can 
interfere  with  restriction  enzyme  digestion  if  the  final  concentration  of  EDTA  in  the 
reaction  is  too  high.  This  situation  usually  results  when  the  concentration  of  the  substrate 
DNA  is  low  and  it  is  necessary  to  use  a  large  volume  of  DNA  in  the  digest.  In  such 
cases,  it  is  best  to  concentrate  the  DNA  (e.g.,  by  ethanol  precipitation).  The  organic 
solvents,  salts,  detergents  and  chelating  agents  that  are  sometimes  used  during  the 
purification  of  DNA  can  also  interfere  with  restriction  enzyme  activity  if  they  carry  over 
the  final  DNA  solution.  Dialysis  and/or  ethanol  precipitation  with  2.5  M  ammonium 
acetate  (final  concentration  before  adding  ethanol)  followed  by  drying  and  resuspension 
can  remove  many  of  these  substances.  While  relatively  pure  DNA  is  required  for 
efficient  for  efficient  restriction  enzyme  digestion,  additional  of  acetylated  BSA  to  a  final 
concentration  of  0. 1  mg/ml  can  sometimes  improve  the  quality  and  efficiency  of  enzyme 
assays  containing  impure  DNA  and  we  recommend  that  it  be  included  in  all  digests. 

Enzyme  Storage,  Handling  and  Use 

Maintain  the  sterility  of  reagents  used  in  the  RE  digest  as  well  as  any  tools  (e.g.,  tubes, 
pipette  tips)  used  with  those  reagents.  Restriction  enzymes  should  be  stored  in  a  non¬ 
frost-free  freezer,  except  for  a  brief  period  during  use,  when  they  should  be  kept  on  ice. 
The  restriction  enzyme  is  usually  the  last  reagent  added  to  a  reaction,  to  ensure  that  it  is 
not  exposed  to  extreme  conditions.  When  many  similar  digests  are  being  prepared,  it 
may  be  convenient  to  create  premixes  of  common  reagents. 

Before  assembling  the  restriction  digest,  thoroughly  mix  each  component  to  be  added  to 
the  reaction  and  then  centrifuge  the  tubes  of  reagents  briefly  to  collect  the  contents  in  the 
bottom  of  the  tube.  The  reaction  components  should  also  be  mixed  after  addition  of  the 
enzyme  to  the  digest.  While  high  salt  buffers  and  glycerol-containing  reagents  are 
difficult  to  mix,  all  solutions  containing  restriction  enzymes  must  be  mixes  gently  to 
avoid  inactivating  the  enzyme. 

Setting  up  a  Restriction  Enzyme  Digest  (adapted  from  Promega  protocol) 

An  analytical  scale  restriction  enzyme  digest  is  usually  performed  in  a  volume  of  20  pi 
on  0.2-1 .5  pg  of  substrate  DNA,  using  a  two-  to  tenfold  excess  of  enzyme  over  DNA.  If 
an  unusually  large  volume  of  DNA  or  enzyme  is  used,  aberrant  results  may  occur  and 
may  not  be  readily  recognized.  The  following  is  the  protocol  followed  for  this  research: 
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1 .  Turn  37°C  water  bath 


2.  Put  BSA,  Buffer  H,  EcoRl  on  ice  to  thaw.  Put  DNA  from  selected  samples  in  a  tube 
holder  to  thaw. 

3.  Add  ingredients  one  at  a  time  as  follows  to  an  eppendorf  tube.  Don’t  forget  to  label 
tube  by  sample  and  denote  it  is  a  restriction  digest  by  adding  RD  to  the  label. 

14.3  pi  distilled  water 
2.0  pi  Buffer  H 
3.0  pi  DNA 
.2  pi  BSA 
.5  pi  EcoRl 
Total  Volume  20  pi 

4.  Place  all  restricted  digested  samples  in  the  water  bath  for  2-3  hours. 

Experimental  Controls 

Experimental  controls  are  necessary  to  identify,  understand  and  explain  problems  or 
inconsistencies  in  results.  The  following  controls  are  commonly  used  in  parallel  with  RE 
digests:  (i)  uncut  experimental  DNA,  (ii)  digest  of  commercially  supplied  control  DNA, 
(iii)  no-enzyme  “mock”  digest,  (iv)  1  of  2  different  sizes  markers  in  more  than  one  lane 
per  gel  (i.e.,  different  locations). 
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Appendix  G:  Gels 


F11.L1  Gel:  Lane  l-100bp  ladder;  Lane  2-F11.L1.5.24;  Lane  3-F11.L1.6.12;  Lane  4-F11.L1.1.24; 
Lane  5-F11.L1.1.36;  Lane  6-F11.L1.3.23;  Lane  7-F11.L1.1.21;  Lane  8-F11.L1.3.24;  Lane  9-A21.3.10; 
Lane  10-F11.L1.2.26;  Lane  11-A21.3.21;  Lane  12-F11.L1.2.22;  Lane  13-A21.3.23;  Lane  14-100  bp 
ladder 
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Plasmid  Band 


Insert 


M11-1.L1  Gel:  Lane  1-M11-1.L1.1.1;  Lane  2-M11-1.L1.1.4;  Lane  3-M11-1.L1.1.11;  Lane  4-M11- 
1.L1.2.3;  Lane  5-100  bp  ladder;  Lane  6-M11-1.L1.2.16;  Lane  7-M11-1.L1.3.8;  Lane  8-M11-1.L1.4.2; 
Lane  9-M11-1.L1.4.4;  Lane  10-Empty;  Lane  11-M11-1.L1.4.16 
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Plasmid  Band 


Insert 


Ap53.Ll  Gel:  Lane  l-Ap53.L1.5.6;  Lane  2-Ap53.L1.3.18;  Lane  3-Ap53.L1.3.14;  Lane  4- 
Ap53.L1.3.10;  Lane  5:  lOObp  ladder;  Lane  6-Ap53.L1.2.5;  Lane  7-Ap53.L1.3.2;  Lane  8- 
Ap53.L1.5.18;  Lane  9-Ap53.L1.5.14;  Lane  10-Ap53.L1.5.13 


112 


My 62. LI  and  EZNAGel:  Lane  1-100  bp  ladder;  Lane  2-My62.Ll.l.l;  Lane  3-My62.L1.1.2;  Lane  4- 
EZNA  (other  research);  Lane  5:  EZNA  (other  research);  Lane  6-My62.L1.1.21;  Lane  7- 
My62.L1.2.21;  Lane  8-My62.L1.3.21;  Lane  9-My62.L1.4.21;  Lane  10-My62.L1.5.21 
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Appendix  H:  Richness  Estimator  and  Rarefaction  Curves 


All  Data  Ace  (97% ) 


All  Data  Ace  (80% ) 
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All  Data  Chao  1  (97% ) 


All  Data  Chaol  (80%) 
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All  Data  Rarefaction  (95%  every  10) 


All  Data  Rarefaction  (80%  every  10) 
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Planted  Ace  (97% ) 
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Planted  Chao  1  (97%) 
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Planted  Rarefaction  (97%  every  10) 
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Blank  Ace  (97%) 


Blank  Ace  (80%) 
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Blank  Chao  (97%) 
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Blank  Rarefaction  (96%  every  10) 


Blank  Rarefaction  (80%  every  10) 
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Car&x  comosa  Ace  Richness  Estimator  Curve  (97%) 


0  50  100  150  200  250  300  350  400  450  500 

#of  Sequences 


123 


Carexcomosa  Chao  1  Richness  Estimator  Curve  (97%) 
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Car&xcomosa  Rarefaction  Curve  (97%  every  10) 
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Carexcomosa  Rarefaction  Curve  (80%  every  10) 
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Ace  (97%) 
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Chao  (97%) 
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Rarefaction  (97%  every  10) 


80%  (every  10) 
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Scirpus  altrQvir&rts  Ace  Richness  Estimator  (97%) 
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Scirpus  altrovirQrts  Chao  1  Richness  Estimator  Curve  (97% ) 


scirpus  anrovirens  Chao  i  Richness  Estimator  curve  (80% ) 
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Scirpus  altrovirsns  Rarefaction  Curve  (97%  every  10) 


Scirpus  altrovirens  Rarefaction  Curve  (80%  every  10) 
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